orderflow_backtest/docs/architecture.md

8.5 KiB

System Architecture

Overview

The current system is a streamlined, high-performance pipeline that streams orderflow from SQLite databases, aggregates trades into OHLC bars, maintains a lightweight depth snapshot, and serves visuals via a Dash web application. Inter-process communication (IPC) between the processor and visualizer uses atomic JSON files for simplicity and robustness.

High-Level Architecture

┌─────────────────┐   ┌─────────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│  SQLite Files   │ → │   DB Interpreter    │ → │   OHLC/Depth     │ → │  Dash Visualizer │
│  (book,trades)  │   │  (stream rows)      │   │   Processor      │   │  (app.py)        │
└─────────────────┘   └─────────────────────┘   └─────────┬────────┘   └────────────▲─────┘
                                                          │                         │
                                                          │  Atomic JSON (IPC)      │
                                                          ▼                         │
                                                  ohlc_data.json, depth_data.json   │
                                                  metrics_data.json                 │
                                                                                    │
                                                                              Browser UI

Components

Data Access (db_interpreter.py)

  • OrderbookLevel: dataclass representing one price level.
  • OrderbookUpdate: container for a book row window with bids, asks, timestamp, and end_timestamp.
  • DBInterpreter:
    • stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]] streams the book table with lookahead and the trades table in timestamp order.
    • Efficient read-only connection with PRAGMA tuning: immutable mode, query_only, temp_store=MEMORY, mmap_size, cache_size.
    • Batching constants: BOOK_BATCH = 2048, TRADE_BATCH = 4096.
    • Each yielded trades element is a tuple (id, trade_id, price, size, side, timestamp_ms) that falls within [book.timestamp, next_book.timestamp).

Processing (Modular Architecture)

Main Coordinator (ohlc_processor.py)

  • OHLCProcessor(window_seconds=60, depth_levels_per_side=50): Orchestrates trade processing using composition
    • process_trades(trades): aggregates trades into OHLC bars and delegates CVD updates
    • update_orderbook(ob_update): coordinates orderbook updates and OBI metric calculation
    • finalize(): finalizes both OHLC bars and metrics data
    • cvd_cumulative (property): provides access to cumulative volume delta

Orderbook Management (orderbook_manager.py)

  • OrderbookManager: Handles in-memory orderbook state with partial updates
    • Maintains separate bid/ask price→size dictionaries
    • Supports deletions via zero-size updates
    • Provides sorted top-N level extraction for visualization

Metrics Calculation (metrics_calculator.py)

  • MetricsCalculator: Manages OBI and CVD metrics with windowed aggregation
    • Tracks CVD from trade flow (buy vs sell volume delta)
    • Calculates OBI from orderbook volume imbalance
    • Provides throttled updates and OHLC-style metric bars

Level Parsing (level_parser.py)

  • Utility functions for normalizing orderbook level data:
    • normalize_levels(): parses levels, filtering zero/negative sizes
    • parse_levels_including_zeros(): preserves zeros for deletion operations
    • Supports JSON and Python literal formats with robust error handling

Inter-Process Communication (viz_io.py)

  • File paths (relative to project root):
    • ohlc_data.json: rolling list of OHLC bars (max 1000).
    • depth_data.json: latest depth snapshot (bids/asks).
    • metrics_data.json: rolling list of OBI/TOT OHLC bars (max 1000).
  • Atomic writes via temp files prevent partial reads by the Dash app.
  • API:
    • add_ohlc_bar(...): append a new bar; trim to last 1000.
    • upsert_ohlc_bar(...): replace last bar if timestamp matches; else append; trim.
    • clear_data(): reset OHLC data to an empty list.

Visualization (app.py)

  • Dash application with two graphs plus OBI subplot:
    • OHLC + Volume subplot with shared x-axis.
    • OBI candlestick subplot (blue tones) sharing x-axis.
    • Depth (cumulative) chart for bids and asks.
  • Polling interval (500 ms) callback reads JSON files and updates figures resilently:
    • Caches last good values to tolerate in-flight writes/decoding errors.
    • Builds figures with Plotly dark theme.
  • Exposed on http://localhost:8050 by default (host=0.0.0.0).

CLI Orchestration (main.py)

  • Typer CLI entrypoint:
    • Arguments: instrument, start_date, end_date (UTC, YYYY-MM-DD), options: --window-seconds.
    • Discovers SQLite files under ../data/OKX matching the instrument.
    • Launches Dash visualizer as a separate process: uv run python app.py.
    • Streams databases sequentially: for each book row, processes trades and updates orderbook.

Data Flow

  1. Discover and open SQLite database(s) for the requested instrument.
  2. Stream book rows with one-row lookahead to form time windows.
  3. Stream trades in timestamp order and bucket into the active window.
  4. For each window:
    • Aggregate trades into OHLC using OHLCProcessor.process_trades.
    • Apply partial depth updates via OHLCProcessor.update_orderbook and emit periodic snapshots.
  5. Persist current OHLC bar(s) and depth snapshots to JSON via atomic writes.
  6. Dash app polls JSON and renders charts.

IPC JSON Schemas

  • OHLC (ohlc_data.json): array of bars; each bar is [ts, open, high, low, close, volume].

  • Depth (depth_data.json): object with bids/asks arrays: {"bids": [[price, size], ...], "asks": [[price, size], ...]}.

  • Metrics (metrics_data.json): array of bars; each bar is [ts, obi_open, obi_high, obi_low, obi_close, tot_open, tot_high, tot_low, tot_close].

Configuration

  • OHLCProcessor(window_seconds, depth_levels_per_side) controls aggregation granularity and depth snapshot size.
  • Visualizer interval (500 ms) balances UI responsiveness and CPU usage.
  • Paths: JSON files (ohlc_data.json, depth_data.json) are colocated with the code and written atomically.
  • CLI parameters select instrument and time range; databases expected under ../data/OKX.

Performance Characteristics

  • Read-only SQLite tuned for fast sequential scans: immutable URI, query_only, large mmap and cache.
  • Batching minimizes cursor churn and Python overhead.
  • JSON IPC uses atomic replace to avoid contention; OHLC list is bounded to 1000 entries.
  • Processor throttles intra-window OHLC upserts and depth emissions to reduce I/O.

Error Handling

  • Visualizer tolerates JSON decode races by reusing last good values and logging warnings.
  • Processor guards depth parsing and writes; logs at debug/info levels.
  • Visualizer startup is wrapped; if it fails, processing continues without UI.

Security Considerations

  • SQLite connections are read-only and immutable; no write queries executed.
  • File writes are confined to project directory; no paths derived from untrusted input.
  • Logs avoid sensitive data; only operational metadata.

Testing Guidance

  • Unit tests (run with uv run pytest):
    • OHLCProcessor: window boundary handling, high/low tracking, volume accumulation, upsert behavior.
    • Depth maintenance: deletions (size==0), top-N sorting, throttling.
    • DBInterpreter.stream: correct trade-window assignment, end-of-stream handling.
  • Integration: end-to-end generation of JSON from a tiny fixture DB and basic figure construction without launching a server.

Roadmap (Optional Enhancements)

  • Metrics: add OBI/CVD computation and persist metrics to a dedicated table.
  • Repository Pattern: extract DB access into a repository module with typed methods.
  • Orchestrator: introduce a Storage pipeline module coordinating batch processing and persistence.
  • Strategy Layer: compute signals/alerts on stored metrics.
  • Visualization: add OBI/CVD subplots and richer interactions.

This document reflects the current implementation centered on SQLite streaming, JSON-based IPC, and a Dash visualizer, providing a clear foundation for incremental enhancements.