# System Architecture ## Overview The Orderflow Backtest System is designed as a modular, high-performance data processing pipeline for cryptocurrency trading analysis. The architecture emphasizes separation of concerns, efficient memory usage, and scalable processing of large datasets. ## High-Level Architecture ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ Data Sources │ │ Processing │ │ Presentation │ │ │ │ │ │ │ │ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌─────────────┐ │ │ │SQLite Files │─┼────┼→│ Storage │─┼────┼→│ Visualizer │ │ │ │- orderbook │ │ │ │- Orchestrator│ │ │ │- OHLC Charts│ │ │ │- trades │ │ │ │- Calculator │ │ │ │- OBI/CVD │ │ │ └─────────────┘ │ │ └──────────────┘ │ │ └─────────────┘ │ │ │ │ │ │ │ ▲ │ └─────────────────┘ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ │ Strategy │──┼────┼→│ Reports │ │ │ │- Analysis │ │ │ │- Metrics │ │ │ │- Alerts │ │ │ │- Summaries │ │ │ └─────────────┘ │ │ └─────────────┘ │ └──────────────────┘ └─────────────────┘ ``` ## Component Architecture ### Data Layer #### Models (`models.py`) **Purpose**: Core data structures and calculation logic ```python # Core data models OrderbookLevel # Single price level (price, size, order_count, liquidation_count) Trade # Individual trade execution (price, size, side, timestamp) BookSnapshot # Complete orderbook state at timestamp Book # Container for snapshot sequence Metric # Calculated OBI/CVD values # Calculation engine MetricCalculator # Static methods for OBI/CVD computation ``` **Relationships**: - `Book` contains multiple `BookSnapshot` instances - `BookSnapshot` contains dictionaries of `OrderbookLevel` and lists of `Trade` - `Metric` stores calculated values for each `BookSnapshot` - `MetricCalculator` operates on snapshots to produce metrics #### Repositories (`repositories/`) **Purpose**: Database access and persistence layer ```python # Repository SQLiteOrderflowRepository: - connect() # Optimized SQLite connection - load_trades_by_timestamp() # Efficient trade loading - iterate_book_rows() # Memory-efficient snapshot streaming - count_rows() # Performance monitoring - create_metrics_table() # Schema creation - insert_metrics_batch() # High-performance batch inserts - load_metrics_by_timerange() # Time-range queries - table_exists() # Schema validation ``` **Design Patterns**: - **Repository Pattern**: Clean separation between data access and business logic - **Batch Processing**: Process 1000 records per database operation - **Connection Management**: Caller manages connection lifecycle - **Performance Optimization**: SQLite PRAGMAs for high-speed operations ### Processing Layer #### Storage (`storage.py`) **Purpose**: Orchestrates data loading, processing, and metrics calculation ```python class Storage: - build_booktick_from_db() # Main processing pipeline - _create_snapshots_and_metrics() # Per-snapshot processing - _snapshot_from_row() # Individual snapshot creation ``` **Processing Pipeline**: 1. **Initialize**: Create metrics repository and table if needed 2. **Load Trades**: Group trades by timestamp for efficient access 3. **Stream Processing**: Process snapshots one-by-one to minimize memory 4. **Calculate Metrics**: OBI and CVD calculation per snapshot 5. **Batch Persistence**: Store metrics in batches of 1000 6. **Memory Management**: Discard full snapshots after metric extraction #### Strategy Framework (`strategies.py`) **Purpose**: Trading analysis and signal generation ```python class DefaultStrategy: - set_db_path() # Configure database access - compute_OBI() # Real-time OBI calculation (fallback) - load_stored_metrics() # Retrieve persisted metrics - get_metrics_summary() # Statistical analysis - on_booktick() # Main analysis entry point ``` **Analysis Capabilities**: - **Stored Metrics**: Primary analysis using persisted data - **Real-time Fallback**: Live calculation for compatibility - **Statistical Summaries**: Min/max/average OBI, CVD changes - **Alert System**: Configurable thresholds for significant imbalances ### Presentation Layer #### Visualization (`visualizer.py`) **Purpose**: Multi-chart rendering and display ```python class Visualizer: - set_db_path() # Configure metrics access - update_from_book() # Main rendering pipeline - _load_stored_metrics() # Retrieve metrics for chart range - _draw() # Multi-subplot rendering - show() # Display interactive charts ``` **Chart Layout**: ``` ┌─────────────────────────────────────┐ │ OHLC Candlesticks │ ← Price action ├─────────────────────────────────────┤ │ Volume Bars │ ← Trading volume ├─────────────────────────────────────┤ │ OBI Line Chart │ ← Order book imbalance ├─────────────────────────────────────┤ │ CVD Line Chart │ ← Cumulative volume delta └─────────────────────────────────────┘ ``` **Features**: - **Shared Time Axis**: Synchronized X-axis across all subplots - **Auto-scaling**: Y-axis optimization for each metric type - **Performance**: Efficient rendering of large datasets - **Interactive**: Qt5Agg backend for zooming and panning ## Data Flow ### Processing Flow ``` 1. SQLite DB → Repository → Raw Data 2. Raw Data → Storage → BookSnapshot 3. BookSnapshot → MetricCalculator → OBI/CVD 4. Metrics → Repository → Database Storage 5. Stored Metrics → Strategy → Analysis 6. Stored Metrics → Visualizer → Charts ``` ### Memory Management Flow ``` Traditional: DB → All Snapshots in Memory → Analysis (High Memory) Optimized: DB → Process Snapshot → Calculate Metrics → Store → Discard (Low Memory) ``` ## Database Schema ### Input Schema (Required) ```sql -- Orderbook snapshots CREATE TABLE book ( id INTEGER PRIMARY KEY, instrument TEXT, bids TEXT, -- JSON: [[price, size, liq_count, order_count], ...] asks TEXT, -- JSON: [[price, size, liq_count, order_count], ...] timestamp TEXT ); -- Trade executions CREATE TABLE trades ( id INTEGER PRIMARY KEY, instrument TEXT, trade_id TEXT, price REAL, size REAL, side TEXT, -- "buy" or "sell" timestamp TEXT ); ``` ### Output Schema (Auto-created) ```sql -- Calculated metrics CREATE TABLE metrics ( id INTEGER PRIMARY KEY AUTOINCREMENT, snapshot_id INTEGER, timestamp TEXT, obi REAL, -- Order Book Imbalance [-1, 1] cvd REAL, -- Cumulative Volume Delta best_bid REAL, best_ask REAL, FOREIGN KEY (snapshot_id) REFERENCES book(id) ); -- Performance indexes CREATE INDEX idx_metrics_timestamp ON metrics(timestamp); CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id); ``` ## Performance Characteristics ### Memory Optimization - **Before**: Store all snapshots in memory (~1GB for 600K snapshots) - **After**: Store only metrics data (~300MB for same dataset) - **Reduction**: >70% memory usage decrease ### Processing Performance - **Batch Size**: 1000 records per database operation - **Processing Speed**: ~1000 snapshots/second on modern hardware - **Database Overhead**: <20% storage increase for metrics table - **Query Performance**: Sub-second retrieval for typical time ranges ### Scalability Limits - **Single File**: 1M+ snapshots per database file - **Time Range**: Months to years of historical data - **Memory Peak**: <2GB for year-long datasets - **Disk Space**: Original size + 20% for metrics ## Integration Points ### External Interfaces ```python # Main application entry point main.py: - CLI argument parsing - Database file discovery - Component orchestration - Progress monitoring # Plugin interfaces Strategy.on_booktick(book: Book) # Strategy integration point Visualizer.update_from_book(book) # Visualization integration ``` ### Internal Interfaces ```python # Repository interfaces Repository.connect() → Connection Repository.load_data() → TypedData Repository.store_data(data) → None # Calculator interfaces MetricCalculator.calculate_obi(snapshot) → float MetricCalculator.calculate_cvd(prev_cvd, trades) → float ``` ## Security Considerations ### Data Protection - **SQL Injection**: All queries use parameterized statements - **File Access**: Validates database file paths and permissions - **Error Handling**: No sensitive data in error messages - **Input Validation**: Sanitizes all external inputs ### Access Control - **Database**: Respects file system permissions - **Memory**: No sensitive data persistence beyond processing - **Logging**: Configurable log levels without data exposure ## Configuration Management ### Performance Tuning ```python # Storage configuration BATCH_SIZE = 1000 # Records per database operation LOG_FREQUENCY = 20 # Progress reports per processing run # SQLite optimization PRAGMA journal_mode = OFF # Maximum write performance PRAGMA synchronous = OFF # Disable synchronous writes PRAGMA cache_size = 100000 # Large memory cache ``` ### Visualization Settings ```python # Chart configuration WINDOW_SECONDS = 60 # OHLC aggregation window MAX_BARS = 500 # Maximum bars displayed FIGURE_SIZE = (12, 10) # Chart dimensions ``` ## Error Handling Strategy ### Graceful Degradation - **Database Errors**: Continue with reduced functionality - **Calculation Errors**: Skip problematic snapshots with logging - **Visualization Errors**: Display available data, note issues - **Memory Pressure**: Adjust batch sizes automatically ### Recovery Mechanisms - **Partial Processing**: Resume from last successful batch - **Data Validation**: Verify metrics calculations before storage - **Rollback Support**: Transaction boundaries for data consistency --- This architecture provides a robust, scalable foundation for high-frequency trading data analysis while maintaining clean separation of concerns and efficient resource utilization.