# System Architecture ## Overview The current system is a streamlined, high-performance pipeline that streams orderflow from SQLite databases, aggregates trades into OHLC bars, maintains a lightweight depth snapshot, and serves visuals via a Dash web application. Inter-process communication (IPC) between the processor and visualizer uses atomic JSON files for simplicity and robustness. ## High-Level Architecture ``` ┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ SQLite Files │ → │ DB Interpreter │ → │ OHLC/Depth │ → │ Dash Visualizer │ │ (book,trades) │ │ (stream rows) │ │ Processor │ │ (app.py) │ └─────────────────┘ └─────────────────────┘ └─────────┬────────┘ └────────────▲─────┘ │ │ │ Atomic JSON (IPC) │ ▼ │ ohlc_data.json, depth_data.json │ metrics_data.json │ │ Browser UI ``` ## Components ### Data Access (`db_interpreter.py`) - `OrderbookLevel`: dataclass representing one price level. - `OrderbookUpdate`: container for a book row window with `bids`, `asks`, `timestamp`, and `end_timestamp`. - `DBInterpreter`: - `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]` streams the book table with lookahead and the trades table in timestamp order. - Efficient read-only connection with PRAGMA tuning: immutable mode, query_only, temp_store=MEMORY, mmap_size, cache_size. - Batching constants: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`. - Each yielded `trades` element is a tuple `(id, trade_id, price, size, side, timestamp_ms)` that falls within `[book.timestamp, next_book.timestamp)`. ### Processing (Modular Architecture) #### Main Coordinator (`ohlc_processor.py`) - `OHLCProcessor(window_seconds=60, depth_levels_per_side=50)`: Orchestrates trade processing using composition - `process_trades(trades)`: aggregates trades into OHLC bars and delegates CVD updates - `update_orderbook(ob_update)`: coordinates orderbook updates and OBI metric calculation - `finalize()`: finalizes both OHLC bars and metrics data - `cvd_cumulative` (property): provides access to cumulative volume delta #### Orderbook Management (`orderbook_manager.py`) - `OrderbookManager`: Handles in-memory orderbook state with partial updates - Maintains separate bid/ask price→size dictionaries - Supports deletions via zero-size updates - Provides sorted top-N level extraction for visualization #### Metrics Calculation (`metrics_calculator.py`) - `MetricsCalculator`: Manages OBI and CVD metrics with windowed aggregation - Tracks CVD from trade flow (buy vs sell volume delta) - Calculates OBI from orderbook volume imbalance - Provides throttled updates and OHLC-style metric bars #### Level Parsing (`level_parser.py`) - Utility functions for normalizing orderbook level data: - `normalize_levels()`: parses levels, filtering zero/negative sizes - `parse_levels_including_zeros()`: preserves zeros for deletion operations - Supports JSON and Python literal formats with robust error handling ### Inter-Process Communication (`viz_io.py`) - File paths (relative to project root): - `ohlc_data.json`: rolling list of OHLC bars (max 1000). - `depth_data.json`: latest depth snapshot (bids/asks). - `metrics_data.json`: rolling list of OBI/TOT OHLC bars (max 1000). - Atomic writes via temp files prevent partial reads by the Dash app. - API: - `add_ohlc_bar(...)`: append a new bar; trim to last 1000. - `upsert_ohlc_bar(...)`: replace last bar if timestamp matches; else append; trim. - `clear_data()`: reset OHLC data to an empty list. ### Visualization (`app.py`) - Dash application with two graphs plus OBI subplot: - OHLC + Volume subplot with shared x-axis. - OBI candlestick subplot (blue tones) sharing x-axis. - Depth (cumulative) chart for bids and asks. - Polling interval (500 ms) callback reads JSON files and updates figures resilently: - Caches last good values to tolerate in-flight writes/decoding errors. - Builds figures with Plotly dark theme. - Exposed on `http://localhost:8050` by default (`host=0.0.0.0`). ### CLI Orchestration (`main.py`) - Typer CLI entrypoint: - Arguments: `instrument`, `start_date`, `end_date` (UTC, `YYYY-MM-DD`), options: `--window-seconds`. - Discovers SQLite files under `../data/OKX` matching the instrument. - Launches Dash visualizer as a separate process: `uv run python app.py`. - Streams databases sequentially: for each book row, processes trades and updates orderbook. ## Data Flow 1. Discover and open SQLite database(s) for the requested instrument. 2. Stream `book` rows with one-row lookahead to form time windows. 3. Stream `trades` in timestamp order and bucket into the active window. 4. For each window: - Aggregate trades into OHLC using `OHLCProcessor.process_trades`. - Apply partial depth updates via `OHLCProcessor.update_orderbook` and emit periodic snapshots. 5. Persist current OHLC bar(s) and depth snapshots to JSON via atomic writes. 6. Dash app polls JSON and renders charts. ## IPC JSON Schemas - OHLC (`ohlc_data.json`): array of bars; each bar is `[ts, open, high, low, close, volume]`. - Depth (`depth_data.json`): object with bids/asks arrays: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`. - Metrics (`metrics_data.json`): array of bars; each bar is `[ts, obi_open, obi_high, obi_low, obi_close, tot_open, tot_high, tot_low, tot_close]`. ## Configuration - `OHLCProcessor(window_seconds, depth_levels_per_side)` controls aggregation granularity and depth snapshot size. - Visualizer interval (`500 ms`) balances UI responsiveness and CPU usage. - Paths: JSON files (`ohlc_data.json`, `depth_data.json`) are colocated with the code and written atomically. - CLI parameters select instrument and time range; databases expected under `../data/OKX`. ## Performance Characteristics - Read-only SQLite tuned for fast sequential scans: immutable URI, query_only, large mmap and cache. - Batching minimizes cursor churn and Python overhead. - JSON IPC uses atomic replace to avoid contention; OHLC list is bounded to 1000 entries. - Processor throttles intra-window OHLC upserts and depth emissions to reduce I/O. ## Error Handling - Visualizer tolerates JSON decode races by reusing last good values and logging warnings. - Processor guards depth parsing and writes; logs at debug/info levels. - Visualizer startup is wrapped; if it fails, processing continues without UI. ## Security Considerations - SQLite connections are read-only and immutable; no write queries executed. - File writes are confined to project directory; no paths derived from untrusted input. - Logs avoid sensitive data; only operational metadata. ## Testing Guidance - Unit tests (run with `uv run pytest`): - `OHLCProcessor`: window boundary handling, high/low tracking, volume accumulation, upsert behavior. - Depth maintenance: deletions (size==0), top-N sorting, throttling. - `DBInterpreter.stream`: correct trade-window assignment, end-of-stream handling. - Integration: end-to-end generation of JSON from a tiny fixture DB and basic figure construction without launching a server. ## Roadmap (Optional Enhancements) - Metrics: add OBI/CVD computation and persist metrics to a dedicated table. - Repository Pattern: extract DB access into a repository module with typed methods. - Orchestrator: introduce a `Storage` pipeline module coordinating batch processing and persistence. - Strategy Layer: compute signals/alerts on stored metrics. - Visualization: add OBI/CVD subplots and richer interactions. --- This document reflects the current implementation centered on SQLite streaming, JSON-based IPC, and a Dash visualizer, providing a clear foundation for incremental enhancements.