2025-08-26 17:22:07 +08:00
# System Architecture
## Overview
2025-09-10 15:39:16 +08:00
The current system is a streamlined, high-performance pipeline that streams orderflow from SQLite databases, aggregates trades into OHLC bars, maintains a lightweight depth snapshot, and serves visuals via a Dash web application. Inter-process communication (IPC) between the processor and visualizer uses atomic JSON files for simplicity and robustness.
2025-08-26 17:22:07 +08:00
## High-Level Architecture
```
2025-09-10 15:39:16 +08:00
┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ SQLite Files │ → │ DB Interpreter │ → │ OHLC/Depth │ → │ Dash Visualizer │
│ (book,trades) │ │ (stream rows) │ │ Processor │ │ (app.py) │
└─────────────────┘ └─────────────────────┘ └─────────┬────────┘ └────────────▲─────┘
│ │
│ Atomic JSON (IPC) │
▼ │
ohlc_data.json, depth_data.json │
metrics_data.json │
│
Browser UI
```
## Components
### Data Access (`db_interpreter.py`)
- `OrderbookLevel` : dataclass representing one price level.
- `OrderbookUpdate` : container for a book row window with `bids` , `asks` , `timestamp` , and `end_timestamp` .
- `DBInterpreter` :
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]` streams the book table with lookahead and the trades table in timestamp order.
- Efficient read-only connection with PRAGMA tuning: immutable mode, query_only, temp_store=MEMORY, mmap_size, cache_size.
- Batching constants: `BOOK_BATCH = 2048` , `TRADE_BATCH = 4096` .
- Each yielded `trades` element is a tuple `(id, trade_id, price, size, side, timestamp_ms)` that falls within `[book.timestamp, next_book.timestamp)` .
### Processing (Modular Architecture)
#### Main Coordinator (`ohlc_processor.py`)
- `OHLCProcessor(window_seconds=60, depth_levels_per_side=50)` : Orchestrates trade processing using composition
- `process_trades(trades)` : aggregates trades into OHLC bars and delegates CVD updates
- `update_orderbook(ob_update)` : coordinates orderbook updates and OBI metric calculation
- `finalize()` : finalizes both OHLC bars and metrics data
- `cvd_cumulative` (property): provides access to cumulative volume delta
#### Orderbook Management (`orderbook_manager.py`)
- `OrderbookManager` : Handles in-memory orderbook state with partial updates
- Maintains separate bid/ask price→size dictionaries
- Supports deletions via zero-size updates
- Provides sorted top-N level extraction for visualization
#### Metrics Calculation (`metrics_calculator.py`)
- `MetricsCalculator` : Manages OBI and CVD metrics with windowed aggregation
- Tracks CVD from trade flow (buy vs sell volume delta)
- Calculates OBI from orderbook volume imbalance
- Provides throttled updates and OHLC-style metric bars
#### Level Parsing (`level_parser.py`)
- Utility functions for normalizing orderbook level data:
- `normalize_levels()` : parses levels, filtering zero/negative sizes
- `parse_levels_including_zeros()` : preserves zeros for deletion operations
- Supports JSON and Python literal formats with robust error handling
### Inter-Process Communication (`viz_io.py`)
- File paths (relative to project root):
- `ohlc_data.json` : rolling list of OHLC bars (max 1000).
- `depth_data.json` : latest depth snapshot (bids/asks).
- `metrics_data.json` : rolling list of OBI/TOT OHLC bars (max 1000).
- Atomic writes via temp files prevent partial reads by the Dash app.
- API:
- `add_ohlc_bar(...)` : append a new bar; trim to last 1000.
- `upsert_ohlc_bar(...)` : replace last bar if timestamp matches; else append; trim.
- `clear_data()` : reset OHLC data to an empty list.
### Visualization (`app.py`)
- Dash application with two graphs plus OBI subplot:
- OHLC + Volume subplot with shared x-axis.
- OBI candlestick subplot (blue tones) sharing x-axis.
- Depth (cumulative) chart for bids and asks.
- Polling interval (500 ms) callback reads JSON files and updates figures resilently:
- Caches last good values to tolerate in-flight writes/decoding errors.
- Builds figures with Plotly dark theme.
- Exposed on `http://localhost:8050` by default (`host=0.0.0.0` ).
### CLI Orchestration (`main.py`)
- Typer CLI entrypoint:
- Arguments: `instrument` , `start_date` , `end_date` (UTC, `YYYY-MM-DD` ), options: `--window-seconds` .
- Discovers SQLite files under `../data/OKX` matching the instrument.
- Launches Dash visualizer as a separate process: `uv run python app.py` .
- Streams databases sequentially: for each book row, processes trades and updates orderbook.
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
## Data Flow
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
1. Discover and open SQLite database(s) for the requested instrument.
2. Stream `book` rows with one-row lookahead to form time windows.
3. Stream `trades` in timestamp order and bucket into the active window.
4. For each window:
- Aggregate trades into OHLC using `OHLCProcessor.process_trades` .
- Apply partial depth updates via `OHLCProcessor.update_orderbook` and emit periodic snapshots.
5. Persist current OHLC bar(s) and depth snapshots to JSON via atomic writes.
6. Dash app polls JSON and renders charts.
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
## IPC JSON Schemas
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
- OHLC (`ohlc_data.json` ): array of bars; each bar is `[ts, open, high, low, close, volume]` .
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
- Depth (`depth_data.json` ): object with bids/asks arrays: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}` .
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
- Metrics (`metrics_data.json` ): array of bars; each bar is `[ts, obi_open, obi_high, obi_low, obi_close, tot_open, tot_high, tot_low, tot_close]` .
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
## Configuration
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
- `OHLCProcessor(window_seconds, depth_levels_per_side)` controls aggregation granularity and depth snapshot size.
- Visualizer interval (`500 ms` ) balances UI responsiveness and CPU usage.
- Paths: JSON files (`ohlc_data.json` , `depth_data.json` ) are colocated with the code and written atomically.
- CLI parameters select instrument and time range; databases expected under `../data/OKX` .
2025-08-26 17:22:07 +08:00
## Performance Characteristics
2025-09-10 15:39:16 +08:00
- Read-only SQLite tuned for fast sequential scans: immutable URI, query_only, large mmap and cache.
- Batching minimizes cursor churn and Python overhead.
- JSON IPC uses atomic replace to avoid contention; OHLC list is bounded to 1000 entries.
- Processor throttles intra-window OHLC upserts and depth emissions to reduce I/O.
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
## Error Handling
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
- Visualizer tolerates JSON decode races by reusing last good values and logging warnings.
- Processor guards depth parsing and writes; logs at debug/info levels.
- Visualizer startup is wrapped; if it fails, processing continues without UI.
2025-08-26 17:22:07 +08:00
## Security Considerations
2025-09-10 15:39:16 +08:00
- SQLite connections are read-only and immutable; no write queries executed.
- File writes are confined to project directory; no paths derived from untrusted input.
- Logs avoid sensitive data; only operational metadata.
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
## Testing Guidance
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
- Unit tests (run with `uv run pytest` ):
- `OHLCProcessor` : window boundary handling, high/low tracking, volume accumulation, upsert behavior.
- Depth maintenance: deletions (size==0), top-N sorting, throttling.
- `DBInterpreter.stream` : correct trade-window assignment, end-of-stream handling.
- Integration: end-to-end generation of JSON from a tiny fixture DB and basic figure construction without launching a server.
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
## Roadmap (Optional Enhancements)
2025-08-26 17:22:07 +08:00
2025-09-10 15:39:16 +08:00
- Metrics: add OBI/CVD computation and persist metrics to a dedicated table.
- Repository Pattern: extract DB access into a repository module with typed methods.
- Orchestrator: introduce a `Storage` pipeline module coordinating batch processing and persistence.
- Strategy Layer: compute signals/alerts on stored metrics.
- Visualization: add OBI/CVD subplots and richer interactions.
2025-08-26 17:22:07 +08:00
---
2025-09-10 15:39:16 +08:00
This document reflects the current implementation centered on SQLite streaming, JSON-based IPC, and a Dash visualizer, providing a clear foundation for incremental enhancements.