308 lines
12 KiB
Markdown
308 lines
12 KiB
Markdown
|
|
# System Architecture
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The Orderflow Backtest System is designed as a modular, high-performance data processing pipeline for cryptocurrency trading analysis. The architecture emphasizes separation of concerns, efficient memory usage, and scalable processing of large datasets.
|
||
|
|
|
||
|
|
## High-Level Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
||
|
|
│ Data Sources │ │ Processing │ │ Presentation │
|
||
|
|
│ │ │ │ │ │
|
||
|
|
│ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌─────────────┐ │
|
||
|
|
│ │SQLite Files │─┼────┼→│ Storage │─┼────┼→│ Visualizer │ │
|
||
|
|
│ │- orderbook │ │ │ │- Orchestrator│ │ │ │- OHLC Charts│ │
|
||
|
|
│ │- trades │ │ │ │- Calculator │ │ │ │- OBI/CVD │ │
|
||
|
|
│ └─────────────┘ │ │ └──────────────┘ │ │ └─────────────┘ │
|
||
|
|
│ │ │ │ │ │ ▲ │
|
||
|
|
└─────────────────┘ │ ┌─────────────┐ │ │ ┌─────────────┐ │
|
||
|
|
│ │ Strategy │──┼────┼→│ Reports │ │
|
||
|
|
│ │- Analysis │ │ │ │- Metrics │ │
|
||
|
|
│ │- Alerts │ │ │ │- Summaries │ │
|
||
|
|
│ └─────────────┘ │ │ └─────────────┘ │
|
||
|
|
└──────────────────┘ └─────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
## Component Architecture
|
||
|
|
|
||
|
|
### Data Layer
|
||
|
|
|
||
|
|
#### Models (`models.py`)
|
||
|
|
**Purpose**: Core data structures and calculation logic
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Core data models
|
||
|
|
OrderbookLevel # Single price level (price, size, order_count, liquidation_count)
|
||
|
|
Trade # Individual trade execution (price, size, side, timestamp)
|
||
|
|
BookSnapshot # Complete orderbook state at timestamp
|
||
|
|
Book # Container for snapshot sequence
|
||
|
|
Metric # Calculated OBI/CVD values
|
||
|
|
|
||
|
|
# Calculation engine
|
||
|
|
MetricCalculator # Static methods for OBI/CVD computation
|
||
|
|
```
|
||
|
|
|
||
|
|
**Relationships**:
|
||
|
|
- `Book` contains multiple `BookSnapshot` instances
|
||
|
|
- `BookSnapshot` contains dictionaries of `OrderbookLevel` and lists of `Trade`
|
||
|
|
- `Metric` stores calculated values for each `BookSnapshot`
|
||
|
|
- `MetricCalculator` operates on snapshots to produce metrics
|
||
|
|
|
||
|
|
#### Repositories (`repositories/`)
|
||
|
|
**Purpose**: Database access and persistence layer
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Read-only base repository
|
||
|
|
SQLiteOrderflowRepository:
|
||
|
|
- connect() # Optimized SQLite connection
|
||
|
|
- load_trades_by_timestamp() # Efficient trade loading
|
||
|
|
- iterate_book_rows() # Memory-efficient snapshot streaming
|
||
|
|
- count_rows() # Performance monitoring
|
||
|
|
|
||
|
|
# Write-enabled metrics repository
|
||
|
|
SQLiteMetricsRepository:
|
||
|
|
- create_metrics_table() # Schema creation
|
||
|
|
- insert_metrics_batch() # High-performance batch inserts
|
||
|
|
- load_metrics_by_timerange() # Time-range queries
|
||
|
|
- table_exists() # Schema validation
|
||
|
|
```
|
||
|
|
|
||
|
|
**Design Patterns**:
|
||
|
|
- **Repository Pattern**: Clean separation between data access and business logic
|
||
|
|
- **Batch Processing**: Process 1000 records per database operation
|
||
|
|
- **Connection Management**: Caller manages connection lifecycle
|
||
|
|
- **Performance Optimization**: SQLite PRAGMAs for high-speed operations
|
||
|
|
|
||
|
|
### Processing Layer
|
||
|
|
|
||
|
|
#### Storage (`storage.py`)
|
||
|
|
**Purpose**: Orchestrates data loading, processing, and metrics calculation
|
||
|
|
|
||
|
|
```python
|
||
|
|
class Storage:
|
||
|
|
- build_booktick_from_db() # Main processing pipeline
|
||
|
|
- _create_snapshots_and_metrics() # Per-snapshot processing
|
||
|
|
- _snapshot_from_row() # Individual snapshot creation
|
||
|
|
```
|
||
|
|
|
||
|
|
**Processing Pipeline**:
|
||
|
|
1. **Initialize**: Create metrics repository and table if needed
|
||
|
|
2. **Load Trades**: Group trades by timestamp for efficient access
|
||
|
|
3. **Stream Processing**: Process snapshots one-by-one to minimize memory
|
||
|
|
4. **Calculate Metrics**: OBI and CVD calculation per snapshot
|
||
|
|
5. **Batch Persistence**: Store metrics in batches of 1000
|
||
|
|
6. **Memory Management**: Discard full snapshots after metric extraction
|
||
|
|
|
||
|
|
#### Strategy Framework (`strategies.py`)
|
||
|
|
**Purpose**: Trading analysis and signal generation
|
||
|
|
|
||
|
|
```python
|
||
|
|
class DefaultStrategy:
|
||
|
|
- set_db_path() # Configure database access
|
||
|
|
- compute_OBI() # Real-time OBI calculation (fallback)
|
||
|
|
- load_stored_metrics() # Retrieve persisted metrics
|
||
|
|
- get_metrics_summary() # Statistical analysis
|
||
|
|
- on_booktick() # Main analysis entry point
|
||
|
|
```
|
||
|
|
|
||
|
|
**Analysis Capabilities**:
|
||
|
|
- **Stored Metrics**: Primary analysis using persisted data
|
||
|
|
- **Real-time Fallback**: Live calculation for compatibility
|
||
|
|
- **Statistical Summaries**: Min/max/average OBI, CVD changes
|
||
|
|
- **Alert System**: Configurable thresholds for significant imbalances
|
||
|
|
|
||
|
|
### Presentation Layer
|
||
|
|
|
||
|
|
#### Visualization (`visualizer.py`)
|
||
|
|
**Purpose**: Multi-chart rendering and display
|
||
|
|
|
||
|
|
```python
|
||
|
|
class Visualizer:
|
||
|
|
- set_db_path() # Configure metrics access
|
||
|
|
- update_from_book() # Main rendering pipeline
|
||
|
|
- _load_stored_metrics() # Retrieve metrics for chart range
|
||
|
|
- _draw() # Multi-subplot rendering
|
||
|
|
- show() # Display interactive charts
|
||
|
|
```
|
||
|
|
|
||
|
|
**Chart Layout**:
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────┐
|
||
|
|
│ OHLC Candlesticks │ ← Price action
|
||
|
|
├─────────────────────────────────────┤
|
||
|
|
│ Volume Bars │ ← Trading volume
|
||
|
|
├─────────────────────────────────────┤
|
||
|
|
│ OBI Line Chart │ ← Order book imbalance
|
||
|
|
├─────────────────────────────────────┤
|
||
|
|
│ CVD Line Chart │ ← Cumulative volume delta
|
||
|
|
└─────────────────────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
**Features**:
|
||
|
|
- **Shared Time Axis**: Synchronized X-axis across all subplots
|
||
|
|
- **Auto-scaling**: Y-axis optimization for each metric type
|
||
|
|
- **Performance**: Efficient rendering of large datasets
|
||
|
|
- **Interactive**: Qt5Agg backend for zooming and panning
|
||
|
|
|
||
|
|
## Data Flow
|
||
|
|
|
||
|
|
### Processing Flow
|
||
|
|
```
|
||
|
|
1. SQLite DB → Repository → Raw Data
|
||
|
|
2. Raw Data → Storage → BookSnapshot
|
||
|
|
3. BookSnapshot → MetricCalculator → OBI/CVD
|
||
|
|
4. Metrics → Repository → Database Storage
|
||
|
|
5. Stored Metrics → Strategy → Analysis
|
||
|
|
6. Stored Metrics → Visualizer → Charts
|
||
|
|
```
|
||
|
|
|
||
|
|
### Memory Management Flow
|
||
|
|
```
|
||
|
|
Traditional: DB → All Snapshots in Memory → Analysis (High Memory)
|
||
|
|
Optimized: DB → Process Snapshot → Calculate Metrics → Store → Discard (Low Memory)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Database Schema
|
||
|
|
|
||
|
|
### Input Schema (Required)
|
||
|
|
```sql
|
||
|
|
-- Orderbook snapshots
|
||
|
|
CREATE TABLE book (
|
||
|
|
id INTEGER PRIMARY KEY,
|
||
|
|
instrument TEXT,
|
||
|
|
bids TEXT, -- JSON: [[price, size, liq_count, order_count], ...]
|
||
|
|
asks TEXT, -- JSON: [[price, size, liq_count, order_count], ...]
|
||
|
|
timestamp TEXT
|
||
|
|
);
|
||
|
|
|
||
|
|
-- Trade executions
|
||
|
|
CREATE TABLE trades (
|
||
|
|
id INTEGER PRIMARY KEY,
|
||
|
|
instrument TEXT,
|
||
|
|
trade_id TEXT,
|
||
|
|
price REAL,
|
||
|
|
size REAL,
|
||
|
|
side TEXT, -- "buy" or "sell"
|
||
|
|
timestamp TEXT
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### Output Schema (Auto-created)
|
||
|
|
```sql
|
||
|
|
-- Calculated metrics
|
||
|
|
CREATE TABLE metrics (
|
||
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
|
|
snapshot_id INTEGER,
|
||
|
|
timestamp TEXT,
|
||
|
|
obi REAL, -- Order Book Imbalance [-1, 1]
|
||
|
|
cvd REAL, -- Cumulative Volume Delta
|
||
|
|
best_bid REAL,
|
||
|
|
best_ask REAL,
|
||
|
|
FOREIGN KEY (snapshot_id) REFERENCES book(id)
|
||
|
|
);
|
||
|
|
|
||
|
|
-- Performance indexes
|
||
|
|
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
|
||
|
|
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance Characteristics
|
||
|
|
|
||
|
|
### Memory Optimization
|
||
|
|
- **Before**: Store all snapshots in memory (~1GB for 600K snapshots)
|
||
|
|
- **After**: Store only metrics data (~300MB for same dataset)
|
||
|
|
- **Reduction**: >70% memory usage decrease
|
||
|
|
|
||
|
|
### Processing Performance
|
||
|
|
- **Batch Size**: 1000 records per database operation
|
||
|
|
- **Processing Speed**: ~1000 snapshots/second on modern hardware
|
||
|
|
- **Database Overhead**: <20% storage increase for metrics table
|
||
|
|
- **Query Performance**: Sub-second retrieval for typical time ranges
|
||
|
|
|
||
|
|
### Scalability Limits
|
||
|
|
- **Single File**: 1M+ snapshots per database file
|
||
|
|
- **Time Range**: Months to years of historical data
|
||
|
|
- **Memory Peak**: <2GB for year-long datasets
|
||
|
|
- **Disk Space**: Original size + 20% for metrics
|
||
|
|
|
||
|
|
## Integration Points
|
||
|
|
|
||
|
|
### External Interfaces
|
||
|
|
```python
|
||
|
|
# Main application entry point
|
||
|
|
main.py:
|
||
|
|
- CLI argument parsing
|
||
|
|
- Database file discovery
|
||
|
|
- Component orchestration
|
||
|
|
- Progress monitoring
|
||
|
|
|
||
|
|
# Plugin interfaces
|
||
|
|
Strategy.on_booktick(book: Book) # Strategy integration point
|
||
|
|
Visualizer.update_from_book(book) # Visualization integration
|
||
|
|
```
|
||
|
|
|
||
|
|
### Internal Interfaces
|
||
|
|
```python
|
||
|
|
# Repository interfaces
|
||
|
|
Repository.connect() → Connection
|
||
|
|
Repository.load_data() → TypedData
|
||
|
|
Repository.store_data(data) → None
|
||
|
|
|
||
|
|
# Calculator interfaces
|
||
|
|
MetricCalculator.calculate_obi(snapshot) → float
|
||
|
|
MetricCalculator.calculate_cvd(prev_cvd, trades) → float
|
||
|
|
```
|
||
|
|
|
||
|
|
## Security Considerations
|
||
|
|
|
||
|
|
### Data Protection
|
||
|
|
- **SQL Injection**: All queries use parameterized statements
|
||
|
|
- **File Access**: Validates database file paths and permissions
|
||
|
|
- **Error Handling**: No sensitive data in error messages
|
||
|
|
- **Input Validation**: Sanitizes all external inputs
|
||
|
|
|
||
|
|
### Access Control
|
||
|
|
- **Database**: Respects file system permissions
|
||
|
|
- **Memory**: No sensitive data persistence beyond processing
|
||
|
|
- **Logging**: Configurable log levels without data exposure
|
||
|
|
|
||
|
|
## Configuration Management
|
||
|
|
|
||
|
|
### Performance Tuning
|
||
|
|
```python
|
||
|
|
# Storage configuration
|
||
|
|
BATCH_SIZE = 1000 # Records per database operation
|
||
|
|
LOG_FREQUENCY = 20 # Progress reports per processing run
|
||
|
|
|
||
|
|
# SQLite optimization
|
||
|
|
PRAGMA journal_mode = OFF # Maximum write performance
|
||
|
|
PRAGMA synchronous = OFF # Disable synchronous writes
|
||
|
|
PRAGMA cache_size = 100000 # Large memory cache
|
||
|
|
```
|
||
|
|
|
||
|
|
### Visualization Settings
|
||
|
|
```python
|
||
|
|
# Chart configuration
|
||
|
|
WINDOW_SECONDS = 60 # OHLC aggregation window
|
||
|
|
MAX_BARS = 500 # Maximum bars displayed
|
||
|
|
FIGURE_SIZE = (12, 10) # Chart dimensions
|
||
|
|
```
|
||
|
|
|
||
|
|
## Error Handling Strategy
|
||
|
|
|
||
|
|
### Graceful Degradation
|
||
|
|
- **Database Errors**: Continue with reduced functionality
|
||
|
|
- **Calculation Errors**: Skip problematic snapshots with logging
|
||
|
|
- **Visualization Errors**: Display available data, note issues
|
||
|
|
- **Memory Pressure**: Adjust batch sizes automatically
|
||
|
|
|
||
|
|
### Recovery Mechanisms
|
||
|
|
- **Partial Processing**: Resume from last successful batch
|
||
|
|
- **Data Validation**: Verify metrics calculations before storage
|
||
|
|
- **Rollback Support**: Transaction boundaries for data consistency
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
This architecture provides a robust, scalable foundation for high-frequency trading data analysis while maintaining clean separation of concerns and efficient resource utilization.
|