12 KiB
System Architecture
Overview
The Orderflow Backtest System is designed as a modular, high-performance data processing pipeline for cryptocurrency trading analysis. The architecture emphasizes separation of concerns, efficient memory usage, and scalable processing of large datasets.
High-Level Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Data Sources │ │ Processing │ │ Presentation │
│ │ │ │ │ │
│ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌─────────────┐ │
│ │SQLite Files │─┼────┼→│ Storage │─┼────┼→│ Visualizer │ │
│ │- orderbook │ │ │ │- Orchestrator│ │ │ │- OHLC Charts│ │
│ │- trades │ │ │ │- Calculator │ │ │ │- OBI/CVD │ │
│ └─────────────┘ │ │ └──────────────┘ │ │ └─────────────┘ │
│ │ │ │ │ │ ▲ │
└─────────────────┘ │ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ Strategy │──┼────┼→│ Reports │ │
│ │- Analysis │ │ │ │- Metrics │ │
│ │- Alerts │ │ │ │- Summaries │ │
│ └─────────────┘ │ │ └─────────────┘ │
└──────────────────┘ └─────────────────┘
Component Architecture
Data Layer
Models (models.py)
Purpose: Core data structures and calculation logic
# Core data models
OrderbookLevel # Single price level (price, size, order_count, liquidation_count)
Trade # Individual trade execution (price, size, side, timestamp)
BookSnapshot # Complete orderbook state at timestamp
Book # Container for snapshot sequence
Metric # Calculated OBI/CVD values
# Calculation engine
MetricCalculator # Static methods for OBI/CVD computation
Relationships:
Bookcontains multipleBookSnapshotinstancesBookSnapshotcontains dictionaries ofOrderbookLeveland lists ofTradeMetricstores calculated values for eachBookSnapshotMetricCalculatoroperates on snapshots to produce metrics
Repositories (repositories/)
Purpose: Database access and persistence layer
# Read-only base repository
SQLiteOrderflowRepository:
- connect() # Optimized SQLite connection
- load_trades_by_timestamp() # Efficient trade loading
- iterate_book_rows() # Memory-efficient snapshot streaming
- count_rows() # Performance monitoring
# Write-enabled metrics repository
SQLiteMetricsRepository:
- create_metrics_table() # Schema creation
- insert_metrics_batch() # High-performance batch inserts
- load_metrics_by_timerange() # Time-range queries
- table_exists() # Schema validation
Design Patterns:
- Repository Pattern: Clean separation between data access and business logic
- Batch Processing: Process 1000 records per database operation
- Connection Management: Caller manages connection lifecycle
- Performance Optimization: SQLite PRAGMAs for high-speed operations
Processing Layer
Storage (storage.py)
Purpose: Orchestrates data loading, processing, and metrics calculation
class Storage:
- build_booktick_from_db() # Main processing pipeline
- _create_snapshots_and_metrics() # Per-snapshot processing
- _snapshot_from_row() # Individual snapshot creation
Processing Pipeline:
- Initialize: Create metrics repository and table if needed
- Load Trades: Group trades by timestamp for efficient access
- Stream Processing: Process snapshots one-by-one to minimize memory
- Calculate Metrics: OBI and CVD calculation per snapshot
- Batch Persistence: Store metrics in batches of 1000
- Memory Management: Discard full snapshots after metric extraction
Strategy Framework (strategies.py)
Purpose: Trading analysis and signal generation
class DefaultStrategy:
- set_db_path() # Configure database access
- compute_OBI() # Real-time OBI calculation (fallback)
- load_stored_metrics() # Retrieve persisted metrics
- get_metrics_summary() # Statistical analysis
- on_booktick() # Main analysis entry point
Analysis Capabilities:
- Stored Metrics: Primary analysis using persisted data
- Real-time Fallback: Live calculation for compatibility
- Statistical Summaries: Min/max/average OBI, CVD changes
- Alert System: Configurable thresholds for significant imbalances
Presentation Layer
Visualization (visualizer.py)
Purpose: Multi-chart rendering and display
class Visualizer:
- set_db_path() # Configure metrics access
- update_from_book() # Main rendering pipeline
- _load_stored_metrics() # Retrieve metrics for chart range
- _draw() # Multi-subplot rendering
- show() # Display interactive charts
Chart Layout:
┌─────────────────────────────────────┐
│ OHLC Candlesticks │ ← Price action
├─────────────────────────────────────┤
│ Volume Bars │ ← Trading volume
├─────────────────────────────────────┤
│ OBI Line Chart │ ← Order book imbalance
├─────────────────────────────────────┤
│ CVD Line Chart │ ← Cumulative volume delta
└─────────────────────────────────────┘
Features:
- Shared Time Axis: Synchronized X-axis across all subplots
- Auto-scaling: Y-axis optimization for each metric type
- Performance: Efficient rendering of large datasets
- Interactive: Qt5Agg backend for zooming and panning
Data Flow
Processing Flow
1. SQLite DB → Repository → Raw Data
2. Raw Data → Storage → BookSnapshot
3. BookSnapshot → MetricCalculator → OBI/CVD
4. Metrics → Repository → Database Storage
5. Stored Metrics → Strategy → Analysis
6. Stored Metrics → Visualizer → Charts
Memory Management Flow
Traditional: DB → All Snapshots in Memory → Analysis (High Memory)
Optimized: DB → Process Snapshot → Calculate Metrics → Store → Discard (Low Memory)
Database Schema
Input Schema (Required)
-- Orderbook snapshots
CREATE TABLE book (
id INTEGER PRIMARY KEY,
instrument TEXT,
bids TEXT, -- JSON: [[price, size, liq_count, order_count], ...]
asks TEXT, -- JSON: [[price, size, liq_count, order_count], ...]
timestamp TEXT
);
-- Trade executions
CREATE TABLE trades (
id INTEGER PRIMARY KEY,
instrument TEXT,
trade_id TEXT,
price REAL,
size REAL,
side TEXT, -- "buy" or "sell"
timestamp TEXT
);
Output Schema (Auto-created)
-- Calculated metrics
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER,
timestamp TEXT,
obi REAL, -- Order Book Imbalance [-1, 1]
cvd REAL, -- Cumulative Volume Delta
best_bid REAL,
best_ask REAL,
FOREIGN KEY (snapshot_id) REFERENCES book(id)
);
-- Performance indexes
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
Performance Characteristics
Memory Optimization
- Before: Store all snapshots in memory (~1GB for 600K snapshots)
- After: Store only metrics data (~300MB for same dataset)
- Reduction: >70% memory usage decrease
Processing Performance
- Batch Size: 1000 records per database operation
- Processing Speed: ~1000 snapshots/second on modern hardware
- Database Overhead: <20% storage increase for metrics table
- Query Performance: Sub-second retrieval for typical time ranges
Scalability Limits
- Single File: 1M+ snapshots per database file
- Time Range: Months to years of historical data
- Memory Peak: <2GB for year-long datasets
- Disk Space: Original size + 20% for metrics
Integration Points
External Interfaces
# Main application entry point
main.py:
- CLI argument parsing
- Database file discovery
- Component orchestration
- Progress monitoring
# Plugin interfaces
Strategy.on_booktick(book: Book) # Strategy integration point
Visualizer.update_from_book(book) # Visualization integration
Internal Interfaces
# Repository interfaces
Repository.connect() → Connection
Repository.load_data() → TypedData
Repository.store_data(data) → None
# Calculator interfaces
MetricCalculator.calculate_obi(snapshot) → float
MetricCalculator.calculate_cvd(prev_cvd, trades) → float
Security Considerations
Data Protection
- SQL Injection: All queries use parameterized statements
- File Access: Validates database file paths and permissions
- Error Handling: No sensitive data in error messages
- Input Validation: Sanitizes all external inputs
Access Control
- Database: Respects file system permissions
- Memory: No sensitive data persistence beyond processing
- Logging: Configurable log levels without data exposure
Configuration Management
Performance Tuning
# Storage configuration
BATCH_SIZE = 1000 # Records per database operation
LOG_FREQUENCY = 20 # Progress reports per processing run
# SQLite optimization
PRAGMA journal_mode = OFF # Maximum write performance
PRAGMA synchronous = OFF # Disable synchronous writes
PRAGMA cache_size = 100000 # Large memory cache
Visualization Settings
# Chart configuration
WINDOW_SECONDS = 60 # OHLC aggregation window
MAX_BARS = 500 # Maximum bars displayed
FIGURE_SIZE = (12, 10) # Chart dimensions
Error Handling Strategy
Graceful Degradation
- Database Errors: Continue with reduced functionality
- Calculation Errors: Skip problematic snapshots with logging
- Visualization Errors: Display available data, note issues
- Memory Pressure: Adjust batch sizes automatically
Recovery Mechanisms
- Partial Processing: Resume from last successful batch
- Data Validation: Verify metrics calculations before storage
- Rollback Support: Transaction boundaries for data consistency
This architecture provides a robust, scalable foundation for high-frequency trading data analysis while maintaining clean separation of concerns and efficient resource utilization.