orderflow_backtest/docs/architecture.md

12 KiB

System Architecture

Overview

The Orderflow Backtest System is designed as a modular, high-performance data processing pipeline for cryptocurrency trading analysis. The architecture emphasizes separation of concerns, efficient memory usage, and scalable processing of large datasets.

High-Level Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Data Sources  │    │   Processing     │    │   Presentation  │
│                 │    │                  │    │                 │
│ ┌─────────────┐ │    │ ┌──────────────┐ │    │ ┌─────────────┐ │
│ │SQLite Files │─┼────┼→│   Storage    │─┼────┼→│ Visualizer  │ │
│ │- orderbook  │ │    │ │- Orchestrator│ │    │ │- OHLC Charts│ │
│ │- trades     │ │    │ │- Calculator  │ │    │ │- OBI/CVD    │ │
│ └─────────────┘ │    │ └──────────────┘ │    │ └─────────────┘ │
│                 │    │        │         │    │        ▲        │
└─────────────────┘    │ ┌─────────────┐  │    │ ┌─────────────┐ │
                       │ │  Strategy   │──┼────┼→│   Reports   │ │
                       │ │- Analysis   │  │    │ │- Metrics    │ │
                       │ │- Alerts     │  │    │ │- Summaries  │ │
                       │ └─────────────┘  │    │ └─────────────┘ │
                       └──────────────────┘    └─────────────────┘

Component Architecture

Data Layer

Models (models.py)

Purpose: Core data structures and calculation logic

# Core data models
OrderbookLevel   # Single price level (price, size, order_count, liquidation_count)
Trade           # Individual trade execution (price, size, side, timestamp)
BookSnapshot    # Complete orderbook state at timestamp
Book           # Container for snapshot sequence
Metric         # Calculated OBI/CVD values

# Calculation engine
MetricCalculator # Static methods for OBI/CVD computation

Relationships:

  • Book contains multiple BookSnapshot instances
  • BookSnapshot contains dictionaries of OrderbookLevel and lists of Trade
  • Metric stores calculated values for each BookSnapshot
  • MetricCalculator operates on snapshots to produce metrics

Repositories (repositories/)

Purpose: Database access and persistence layer

# Read-only base repository
SQLiteOrderflowRepository:
  - connect()                    # Optimized SQLite connection
  - load_trades_by_timestamp()   # Efficient trade loading
  - iterate_book_rows()          # Memory-efficient snapshot streaming
  - count_rows()                 # Performance monitoring

# Write-enabled metrics repository
SQLiteMetricsRepository:
  - create_metrics_table()       # Schema creation
  - insert_metrics_batch()       # High-performance batch inserts
  - load_metrics_by_timerange()  # Time-range queries
  - table_exists()               # Schema validation

Design Patterns:

  • Repository Pattern: Clean separation between data access and business logic
  • Batch Processing: Process 1000 records per database operation
  • Connection Management: Caller manages connection lifecycle
  • Performance Optimization: SQLite PRAGMAs for high-speed operations

Processing Layer

Storage (storage.py)

Purpose: Orchestrates data loading, processing, and metrics calculation

class Storage:
  - build_booktick_from_db()           # Main processing pipeline
  - _create_snapshots_and_metrics()    # Per-snapshot processing
  - _snapshot_from_row()               # Individual snapshot creation

Processing Pipeline:

  1. Initialize: Create metrics repository and table if needed
  2. Load Trades: Group trades by timestamp for efficient access
  3. Stream Processing: Process snapshots one-by-one to minimize memory
  4. Calculate Metrics: OBI and CVD calculation per snapshot
  5. Batch Persistence: Store metrics in batches of 1000
  6. Memory Management: Discard full snapshots after metric extraction

Strategy Framework (strategies.py)

Purpose: Trading analysis and signal generation

class DefaultStrategy:
  - set_db_path()              # Configure database access
  - compute_OBI()              # Real-time OBI calculation (fallback)
  - load_stored_metrics()      # Retrieve persisted metrics
  - get_metrics_summary()      # Statistical analysis
  - on_booktick()             # Main analysis entry point

Analysis Capabilities:

  • Stored Metrics: Primary analysis using persisted data
  • Real-time Fallback: Live calculation for compatibility
  • Statistical Summaries: Min/max/average OBI, CVD changes
  • Alert System: Configurable thresholds for significant imbalances

Presentation Layer

Visualization (visualizer.py)

Purpose: Multi-chart rendering and display

class Visualizer:
  - set_db_path()              # Configure metrics access
  - update_from_book()         # Main rendering pipeline
  - _load_stored_metrics()     # Retrieve metrics for chart range
  - _draw()                    # Multi-subplot rendering
  - show()                     # Display interactive charts

Chart Layout:

┌─────────────────────────────────────┐
│            OHLC Candlesticks        │  ← Price action
├─────────────────────────────────────┤
│              Volume Bars            │  ← Trading volume
├─────────────────────────────────────┤
│          OBI Line Chart             │  ← Order book imbalance
├─────────────────────────────────────┤
│          CVD Line Chart             │  ← Cumulative volume delta
└─────────────────────────────────────┘

Features:

  • Shared Time Axis: Synchronized X-axis across all subplots
  • Auto-scaling: Y-axis optimization for each metric type
  • Performance: Efficient rendering of large datasets
  • Interactive: Qt5Agg backend for zooming and panning

Data Flow

Processing Flow

1. SQLite DB → Repository → Raw Data
2. Raw Data → Storage → BookSnapshot
3. BookSnapshot → MetricCalculator → OBI/CVD
4. Metrics → Repository → Database Storage
5. Stored Metrics → Strategy → Analysis
6. Stored Metrics → Visualizer → Charts

Memory Management Flow

Traditional: DB → All Snapshots in Memory → Analysis (High Memory)
Optimized:   DB → Process Snapshot → Calculate Metrics → Store → Discard (Low Memory)

Database Schema

Input Schema (Required)

-- Orderbook snapshots
CREATE TABLE book (
    id INTEGER PRIMARY KEY,
    instrument TEXT,
    bids TEXT,              -- JSON: [[price, size, liq_count, order_count], ...]
    asks TEXT,              -- JSON: [[price, size, liq_count, order_count], ...]
    timestamp TEXT
);

-- Trade executions  
CREATE TABLE trades (
    id INTEGER PRIMARY KEY,
    instrument TEXT,
    trade_id TEXT,
    price REAL,
    size REAL,
    side TEXT,              -- "buy" or "sell"
    timestamp TEXT
);

Output Schema (Auto-created)

-- Calculated metrics
CREATE TABLE metrics (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    snapshot_id INTEGER,
    timestamp TEXT,
    obi REAL,               -- Order Book Imbalance [-1, 1]
    cvd REAL,               -- Cumulative Volume Delta
    best_bid REAL,
    best_ask REAL,
    FOREIGN KEY (snapshot_id) REFERENCES book(id)
);

-- Performance indexes
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);

Performance Characteristics

Memory Optimization

  • Before: Store all snapshots in memory (~1GB for 600K snapshots)
  • After: Store only metrics data (~300MB for same dataset)
  • Reduction: >70% memory usage decrease

Processing Performance

  • Batch Size: 1000 records per database operation
  • Processing Speed: ~1000 snapshots/second on modern hardware
  • Database Overhead: <20% storage increase for metrics table
  • Query Performance: Sub-second retrieval for typical time ranges

Scalability Limits

  • Single File: 1M+ snapshots per database file
  • Time Range: Months to years of historical data
  • Memory Peak: <2GB for year-long datasets
  • Disk Space: Original size + 20% for metrics

Integration Points

External Interfaces

# Main application entry point
main.py:
  - CLI argument parsing
  - Database file discovery
  - Component orchestration
  - Progress monitoring

# Plugin interfaces
Strategy.on_booktick(book: Book)     # Strategy integration point
Visualizer.update_from_book(book)    # Visualization integration

Internal Interfaces

# Repository interfaces
Repository.connect()  Connection
Repository.load_data()  TypedData
Repository.store_data(data)  None

# Calculator interfaces
MetricCalculator.calculate_obi(snapshot)  float
MetricCalculator.calculate_cvd(prev_cvd, trades)  float

Security Considerations

Data Protection

  • SQL Injection: All queries use parameterized statements
  • File Access: Validates database file paths and permissions
  • Error Handling: No sensitive data in error messages
  • Input Validation: Sanitizes all external inputs

Access Control

  • Database: Respects file system permissions
  • Memory: No sensitive data persistence beyond processing
  • Logging: Configurable log levels without data exposure

Configuration Management

Performance Tuning

# Storage configuration
BATCH_SIZE = 1000           # Records per database operation
LOG_FREQUENCY = 20          # Progress reports per processing run

# SQLite optimization
PRAGMA journal_mode = OFF   # Maximum write performance
PRAGMA synchronous = OFF    # Disable synchronous writes
PRAGMA cache_size = 100000  # Large memory cache

Visualization Settings

# Chart configuration
WINDOW_SECONDS = 60         # OHLC aggregation window
MAX_BARS = 500             # Maximum bars displayed
FIGURE_SIZE = (12, 10)     # Chart dimensions

Error Handling Strategy

Graceful Degradation

  • Database Errors: Continue with reduced functionality
  • Calculation Errors: Skip problematic snapshots with logging
  • Visualization Errors: Display available data, note issues
  • Memory Pressure: Adjust batch sizes automatically

Recovery Mechanisms

  • Partial Processing: Resume from last successful batch
  • Data Validation: Verify metrics calculations before storage
  • Rollback Support: Transaction boundaries for data consistency

This architecture provides a robust, scalable foundation for high-frequency trading data analysis while maintaining clean separation of concerns and efficient resource utilization.