WIP UI rework with qt6
This commit is contained in:
756
docs/API.md
756
docs/API.md
@@ -1,550 +1,23 @@
|
||||
# API Documentation
|
||||
# API Documentation (Current Implementation)
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides comprehensive API documentation for the Orderflow Backtest System, including public interfaces, data models, and usage examples.
|
||||
This document describes the public interfaces of the current system: SQLite streaming, OHLC/depth aggregation, JSON-based IPC, and the Dash visualizer. Metrics (OBI/CVD), repository/storage layers, and strategy APIs are not part of the current implementation.
|
||||
|
||||
## Core Data Models
|
||||
## Input Database Schema (Required)
|
||||
|
||||
### OrderbookLevel
|
||||
|
||||
Represents a single price level in the orderbook.
|
||||
|
||||
```python
|
||||
@dataclass(slots=True)
|
||||
class OrderbookLevel:
|
||||
price: float # Price level
|
||||
size: float # Total size at this price
|
||||
liquidation_count: int # Number of liquidations
|
||||
order_count: int # Number of resting orders
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
level = OrderbookLevel(
|
||||
price=50000.0,
|
||||
size=10.5,
|
||||
liquidation_count=0,
|
||||
order_count=3
|
||||
)
|
||||
```
|
||||
|
||||
### Trade
|
||||
|
||||
Represents a single trade execution.
|
||||
|
||||
```python
|
||||
@dataclass(slots=True)
|
||||
class Trade:
|
||||
id: int # Unique trade identifier
|
||||
trade_id: float # Exchange trade ID
|
||||
price: float # Execution price
|
||||
size: float # Trade size
|
||||
side: str # "buy" or "sell"
|
||||
timestamp: int # Unix timestamp
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
trade = Trade(
|
||||
id=1,
|
||||
trade_id=123456.0,
|
||||
price=50000.0,
|
||||
size=0.5,
|
||||
side="buy",
|
||||
timestamp=1640995200
|
||||
)
|
||||
```
|
||||
|
||||
### BookSnapshot
|
||||
|
||||
Complete orderbook state at a specific timestamp.
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class BookSnapshot:
|
||||
id: int # Snapshot identifier
|
||||
timestamp: int # Unix timestamp
|
||||
bids: Dict[float, OrderbookLevel] # Bid side levels
|
||||
asks: Dict[float, OrderbookLevel] # Ask side levels
|
||||
trades: List[Trade] # Associated trades
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
snapshot = BookSnapshot(
|
||||
id=1,
|
||||
timestamp=1640995200,
|
||||
bids={
|
||||
50000.0: OrderbookLevel(50000.0, 10.0, 0, 1),
|
||||
49999.0: OrderbookLevel(49999.0, 5.0, 0, 1)
|
||||
},
|
||||
asks={
|
||||
50001.0: OrderbookLevel(50001.0, 3.0, 0, 1),
|
||||
50002.0: OrderbookLevel(50002.0, 2.0, 0, 1)
|
||||
},
|
||||
trades=[]
|
||||
)
|
||||
```
|
||||
|
||||
### Metric
|
||||
|
||||
Calculated financial metrics for a snapshot.
|
||||
|
||||
```python
|
||||
@dataclass(slots=True)
|
||||
class Metric:
|
||||
snapshot_id: int # Reference to source snapshot
|
||||
timestamp: int # Unix timestamp
|
||||
obi: float # Order Book Imbalance [-1, 1]
|
||||
cvd: float # Cumulative Volume Delta
|
||||
best_bid: float | None # Best bid price
|
||||
best_ask: float | None # Best ask price
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
metric = Metric(
|
||||
snapshot_id=1,
|
||||
timestamp=1640995200,
|
||||
obi=0.333,
|
||||
cvd=150.5,
|
||||
best_bid=50000.0,
|
||||
best_ask=50001.0
|
||||
)
|
||||
```
|
||||
|
||||
## MetricCalculator API
|
||||
|
||||
Static class providing financial metric calculations.
|
||||
|
||||
### calculate_obi()
|
||||
|
||||
```python
|
||||
@staticmethod
|
||||
def calculate_obi(snapshot: BookSnapshot) -> float:
|
||||
"""
|
||||
Calculate Order Book Imbalance.
|
||||
|
||||
Formula: OBI = (Vb - Va) / (Vb + Va)
|
||||
|
||||
Args:
|
||||
snapshot: BookSnapshot with bids and asks
|
||||
|
||||
Returns:
|
||||
float: OBI value between -1 and 1
|
||||
|
||||
Example:
|
||||
>>> obi = MetricCalculator.calculate_obi(snapshot)
|
||||
>>> print(f"OBI: {obi:.3f}")
|
||||
OBI: 0.333
|
||||
"""
|
||||
```
|
||||
|
||||
### calculate_volume_delta()
|
||||
|
||||
```python
|
||||
@staticmethod
|
||||
def calculate_volume_delta(trades: List[Trade]) -> float:
|
||||
"""
|
||||
Calculate Volume Delta for trades.
|
||||
|
||||
Formula: VD = Buy Volume - Sell Volume
|
||||
|
||||
Args:
|
||||
trades: List of Trade objects
|
||||
|
||||
Returns:
|
||||
float: Net volume delta
|
||||
|
||||
Example:
|
||||
>>> vd = MetricCalculator.calculate_volume_delta(trades)
|
||||
>>> print(f"Volume Delta: {vd}")
|
||||
Volume Delta: 7.5
|
||||
"""
|
||||
```
|
||||
|
||||
### calculate_cvd()
|
||||
|
||||
```python
|
||||
@staticmethod
|
||||
def calculate_cvd(previous_cvd: float, volume_delta: float) -> float:
|
||||
"""
|
||||
Calculate Cumulative Volume Delta.
|
||||
|
||||
Formula: CVD_t = CVD_{t-1} + VD_t
|
||||
|
||||
Args:
|
||||
previous_cvd: Previous CVD value
|
||||
volume_delta: Current volume delta
|
||||
|
||||
Returns:
|
||||
float: New CVD value
|
||||
|
||||
Example:
|
||||
>>> cvd = MetricCalculator.calculate_cvd(100.0, 7.5)
|
||||
>>> print(f"CVD: {cvd}")
|
||||
CVD: 107.5
|
||||
"""
|
||||
```
|
||||
|
||||
### get_best_bid_ask()
|
||||
|
||||
```python
|
||||
@staticmethod
|
||||
def get_best_bid_ask(snapshot: BookSnapshot) -> tuple[float | None, float | None]:
|
||||
"""
|
||||
Extract best bid and ask prices.
|
||||
|
||||
Args:
|
||||
snapshot: BookSnapshot with bids and asks
|
||||
|
||||
Returns:
|
||||
tuple: (best_bid, best_ask) or (None, None)
|
||||
|
||||
Example:
|
||||
>>> best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
|
||||
>>> print(f"Spread: {best_ask - best_bid}")
|
||||
Spread: 1.0
|
||||
"""
|
||||
```
|
||||
|
||||
## Repository APIs
|
||||
|
||||
### SQLiteOrderflowRepository
|
||||
|
||||
Repository for orderbook, trades data and metrics.
|
||||
|
||||
#### connect()
|
||||
|
||||
```python
|
||||
def connect(self) -> sqlite3.Connection:
|
||||
"""
|
||||
Create optimized SQLite connection.
|
||||
|
||||
Returns:
|
||||
sqlite3.Connection: Configured database connection
|
||||
|
||||
Example:
|
||||
>>> repo = SQLiteOrderflowRepository(db_path)
|
||||
>>> with repo.connect() as conn:
|
||||
... # Use connection
|
||||
"""
|
||||
```
|
||||
|
||||
#### load_trades_by_timestamp()
|
||||
|
||||
```python
|
||||
def load_trades_by_timestamp(self, conn: sqlite3.Connection) -> Dict[int, List[Trade]]:
|
||||
"""
|
||||
Load all trades grouped by timestamp.
|
||||
|
||||
Args:
|
||||
conn: Active database connection
|
||||
|
||||
Returns:
|
||||
Dict[int, List[Trade]]: Trades grouped by timestamp
|
||||
|
||||
Example:
|
||||
>>> trades_by_ts = repo.load_trades_by_timestamp(conn)
|
||||
>>> trades_at_1000 = trades_by_ts.get(1000, [])
|
||||
"""
|
||||
```
|
||||
|
||||
#### iterate_book_rows()
|
||||
|
||||
```python
|
||||
def iterate_book_rows(self, conn: sqlite3.Connection) -> Iterator[Tuple[int, str, str, int]]:
|
||||
"""
|
||||
Memory-efficient iteration over orderbook rows.
|
||||
|
||||
Args:
|
||||
conn: Active database connection
|
||||
|
||||
Yields:
|
||||
Tuple[int, str, str, int]: (id, bids_text, asks_text, timestamp)
|
||||
|
||||
Example:
|
||||
>>> for row_id, bids, asks, ts in repo.iterate_book_rows(conn):
|
||||
... # Process row
|
||||
"""
|
||||
```
|
||||
|
||||
#### create_metrics_table()
|
||||
|
||||
```python
|
||||
def create_metrics_table(self, conn: sqlite3.Connection) -> None:
|
||||
"""
|
||||
Create metrics table with indexes.
|
||||
|
||||
Args:
|
||||
conn: Active database connection
|
||||
|
||||
Raises:
|
||||
sqlite3.Error: If table creation fails
|
||||
|
||||
Example:
|
||||
>>> repo.create_metrics_table(conn)
|
||||
>>> # Metrics table now available
|
||||
"""
|
||||
```
|
||||
|
||||
#### insert_metrics_batch()
|
||||
|
||||
```python
|
||||
def insert_metrics_batch(self, conn: sqlite3.Connection, metrics: List[Metric]) -> None:
|
||||
"""
|
||||
Insert metrics in batch for performance.
|
||||
|
||||
Args:
|
||||
conn: Active database connection
|
||||
metrics: List of Metric objects to insert
|
||||
|
||||
Example:
|
||||
>>> metrics = [Metric(...), Metric(...)]
|
||||
>>> repo.insert_metrics_batch(conn, metrics)
|
||||
>>> conn.commit()
|
||||
"""
|
||||
```
|
||||
|
||||
#### load_metrics_by_timerange()
|
||||
|
||||
```python
|
||||
def load_metrics_by_timerange(
|
||||
self,
|
||||
conn: sqlite3.Connection,
|
||||
start_timestamp: int,
|
||||
end_timestamp: int
|
||||
) -> List[Metric]:
|
||||
"""
|
||||
Load metrics within time range.
|
||||
|
||||
Args:
|
||||
conn: Active database connection
|
||||
start_timestamp: Start time (inclusive)
|
||||
end_timestamp: End time (inclusive)
|
||||
|
||||
Returns:
|
||||
List[Metric]: Metrics ordered by timestamp
|
||||
|
||||
Example:
|
||||
>>> metrics = repo.load_metrics_by_timerange(conn, 1000, 2000)
|
||||
>>> print(f"Loaded {len(metrics)} metrics")
|
||||
"""
|
||||
```
|
||||
|
||||
## Storage API
|
||||
|
||||
### Storage
|
||||
|
||||
High-level data processing orchestrator.
|
||||
|
||||
#### __init__()
|
||||
|
||||
```python
|
||||
def __init__(self, instrument: str) -> None:
|
||||
"""
|
||||
Initialize storage for specific instrument.
|
||||
|
||||
Args:
|
||||
instrument: Trading pair identifier (e.g., "BTC-USDT")
|
||||
|
||||
Example:
|
||||
>>> storage = Storage("BTC-USDT")
|
||||
"""
|
||||
```
|
||||
|
||||
#### build_booktick_from_db()
|
||||
|
||||
```python
|
||||
def build_booktick_from_db(self, db_path: Path, db_date: datetime) -> None:
|
||||
"""
|
||||
Process database and calculate metrics.
|
||||
|
||||
This is the main processing pipeline that:
|
||||
1. Loads orderbook and trades data
|
||||
2. Calculates OBI and CVD metrics per snapshot
|
||||
3. Stores metrics in database
|
||||
4. Populates book with snapshots
|
||||
|
||||
Args:
|
||||
db_path: Path to SQLite database file
|
||||
db_date: Date for this database (informational)
|
||||
|
||||
Example:
|
||||
>>> storage.build_booktick_from_db(Path("data.db"), datetime.now())
|
||||
>>> print(f"Processed {len(storage.book.snapshots)} snapshots")
|
||||
"""
|
||||
```
|
||||
|
||||
## Strategy API
|
||||
|
||||
### DefaultStrategy
|
||||
|
||||
Trading strategy with metrics analysis capabilities.
|
||||
|
||||
#### __init__()
|
||||
|
||||
```python
|
||||
def __init__(self, instrument: str) -> None:
|
||||
"""
|
||||
Initialize strategy for instrument.
|
||||
|
||||
Args:
|
||||
instrument: Trading pair identifier
|
||||
|
||||
Example:
|
||||
>>> strategy = DefaultStrategy("BTC-USDT")
|
||||
"""
|
||||
```
|
||||
|
||||
#### set_db_path()
|
||||
|
||||
```python
|
||||
def set_db_path(self, db_path: Path) -> None:
|
||||
"""
|
||||
Configure database path for metrics access.
|
||||
|
||||
Args:
|
||||
db_path: Path to database with metrics
|
||||
|
||||
Example:
|
||||
>>> strategy.set_db_path(Path("data.db"))
|
||||
"""
|
||||
```
|
||||
|
||||
#### load_stored_metrics()
|
||||
|
||||
```python
|
||||
def load_stored_metrics(self, start_timestamp: int, end_timestamp: int) -> List[Metric]:
|
||||
"""
|
||||
Load stored metrics for analysis.
|
||||
|
||||
Args:
|
||||
start_timestamp: Start of time range
|
||||
end_timestamp: End of time range
|
||||
|
||||
Returns:
|
||||
List[Metric]: Metrics for specified range
|
||||
|
||||
Example:
|
||||
>>> metrics = strategy.load_stored_metrics(1000, 2000)
|
||||
>>> latest_obi = metrics[-1].obi
|
||||
"""
|
||||
```
|
||||
|
||||
#### get_metrics_summary()
|
||||
|
||||
```python
|
||||
def get_metrics_summary(self, metrics: List[Metric]) -> dict:
|
||||
"""
|
||||
Generate statistical summary of metrics.
|
||||
|
||||
Args:
|
||||
metrics: List of metrics to analyze
|
||||
|
||||
Returns:
|
||||
dict: Statistical summary with keys:
|
||||
- obi_min, obi_max, obi_avg
|
||||
- cvd_start, cvd_end, cvd_change
|
||||
- total_snapshots
|
||||
|
||||
Example:
|
||||
>>> summary = strategy.get_metrics_summary(metrics)
|
||||
>>> print(f"OBI range: {summary['obi_min']:.3f} to {summary['obi_max']:.3f}")
|
||||
"""
|
||||
```
|
||||
|
||||
## Visualizer API
|
||||
|
||||
### Visualizer
|
||||
|
||||
Multi-chart visualization system.
|
||||
|
||||
#### __init__()
|
||||
|
||||
```python
|
||||
def __init__(self, window_seconds: int = 60, max_bars: int = 200) -> None:
|
||||
"""
|
||||
Initialize visualizer with chart parameters.
|
||||
|
||||
Args:
|
||||
window_seconds: OHLC aggregation window
|
||||
max_bars: Maximum bars to display
|
||||
|
||||
Example:
|
||||
>>> visualizer = Visualizer(window_seconds=300, max_bars=1000)
|
||||
"""
|
||||
```
|
||||
|
||||
#### set_db_path()
|
||||
|
||||
```python
|
||||
def set_db_path(self, db_path: Path) -> None:
|
||||
"""
|
||||
Configure database path for metrics loading.
|
||||
|
||||
Args:
|
||||
db_path: Path to database with metrics
|
||||
|
||||
Example:
|
||||
>>> visualizer.set_db_path(Path("data.db"))
|
||||
"""
|
||||
```
|
||||
|
||||
#### update_from_book()
|
||||
|
||||
```python
|
||||
def update_from_book(self, book: Book) -> None:
|
||||
"""
|
||||
Update charts with book data and stored metrics.
|
||||
|
||||
Creates 4-subplot layout:
|
||||
1. OHLC candlesticks
|
||||
2. Volume bars
|
||||
3. OBI line chart
|
||||
4. CVD line chart
|
||||
|
||||
Args:
|
||||
book: Book with snapshots for OHLC calculation
|
||||
|
||||
Example:
|
||||
>>> visualizer.update_from_book(storage.book)
|
||||
>>> # Charts updated with latest data
|
||||
"""
|
||||
```
|
||||
|
||||
#### show()
|
||||
|
||||
```python
|
||||
def show() -> None:
|
||||
"""
|
||||
Display interactive chart window.
|
||||
|
||||
Example:
|
||||
>>> visualizer.show()
|
||||
>>> # Interactive Qt5 window opens
|
||||
"""
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
### Input Tables (Required)
|
||||
|
||||
These tables must exist in the SQLite database files:
|
||||
|
||||
#### book table
|
||||
### book table
|
||||
```sql
|
||||
CREATE TABLE book (
|
||||
id INTEGER PRIMARY KEY,
|
||||
instrument TEXT,
|
||||
bids TEXT NOT NULL, -- JSON array: [[price, size, liq_count, order_count], ...]
|
||||
asks TEXT NOT NULL, -- JSON array: [[price, size, liq_count, order_count], ...]
|
||||
bids TEXT NOT NULL, -- Python-literal: [[price, size, ...], ...]
|
||||
asks TEXT NOT NULL, -- Python-literal: [[price, size, ...], ...]
|
||||
timestamp TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
#### trades table
|
||||
### trades table
|
||||
```sql
|
||||
CREATE TABLE trades (
|
||||
id INTEGER PRIMARY KEY,
|
||||
@@ -557,129 +30,122 @@ CREATE TABLE trades (
|
||||
);
|
||||
```
|
||||
|
||||
### Output Table (Auto-created)
|
||||
## Data Access: db_interpreter.py
|
||||
|
||||
This table is automatically created by the system:
|
||||
### Classes
|
||||
- `OrderbookLevel` (dataclass): represents a price level.
|
||||
- `OrderbookUpdate`: windowed book update with `bids`, `asks`, `timestamp`, `end_timestamp`.
|
||||
|
||||
#### metrics table
|
||||
```sql
|
||||
CREATE TABLE metrics (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
snapshot_id INTEGER NOT NULL,
|
||||
timestamp TEXT NOT NULL,
|
||||
obi REAL NOT NULL, -- Order Book Imbalance [-1, 1]
|
||||
cvd REAL NOT NULL, -- Cumulative Volume Delta
|
||||
best_bid REAL, -- Best bid price
|
||||
best_ask REAL, -- Best ask price
|
||||
FOREIGN KEY (snapshot_id) REFERENCES book(id)
|
||||
);
|
||||
### DBInterpreter
|
||||
```python
|
||||
class DBInterpreter:
|
||||
def __init__(self, db_path: Path): ...
|
||||
|
||||
-- Performance indexes
|
||||
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
|
||||
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
|
||||
def stream(self) -> Iterator[tuple[OrderbookUpdate, list[tuple]]]:
|
||||
"""
|
||||
Stream orderbook rows with one-row lookahead and trades in timestamp order.
|
||||
Yields pairs of (OrderbookUpdate, trades_in_window), where each trade tuple is:
|
||||
(id, trade_id, price, size, side, timestamp_ms) and timestamp_ms ∈ [timestamp, end_timestamp).
|
||||
"""
|
||||
```
|
||||
|
||||
- Read-only SQLite connection with PRAGMA tuning (immutable, query_only, mmap, cache).
|
||||
- Batch sizes: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`.
|
||||
|
||||
## Processing: ohlc_processor.py
|
||||
|
||||
### OHLCProcessor
|
||||
```python
|
||||
class OHLCProcessor:
|
||||
def __init__(self, window_seconds: int = 60, depth_levels_per_side: int = 50): ...
|
||||
|
||||
def process_trades(self, trades: list[tuple]) -> None:
|
||||
"""Aggregate trades into OHLC bars per window; throttled upserts for UI responsiveness."""
|
||||
|
||||
def update_orderbook(self, ob_update: OrderbookUpdate) -> None:
|
||||
"""Maintain in-memory price→size maps, apply partial updates, and emit top-N depth snapshots periodically."""
|
||||
|
||||
def finalize(self) -> None:
|
||||
"""Emit the last OHLC bar if present."""
|
||||
```
|
||||
|
||||
- Internal helpers for parsing levels from JSON or Python-literal strings and for applying deletions (size==0).
|
||||
|
||||
## Inter-Process Communication: viz_io.py
|
||||
|
||||
### Files
|
||||
- `ohlc_data.json`: rolling array of OHLC bars (max 1000).
|
||||
- `depth_data.json`: latest depth snapshot (bids/asks), top-N per side.
|
||||
- `metrics_data.json`: rolling array of OBI OHLC bars (max 1000).
|
||||
|
||||
### Functions
|
||||
```python
|
||||
def add_ohlc_bar(timestamp: int, open_price: float, high_price: float, low_price: float, close_price: float, volume: float = 0.0) -> None: ...
|
||||
|
||||
def upsert_ohlc_bar(timestamp: int, open_price: float, high_price: float, low_price: float, close_price: float, volume: float = 0.0) -> None: ...
|
||||
|
||||
def clear_data() -> None: ...
|
||||
|
||||
def add_metric_bar(timestamp: int, obi_open: float, obi_high: float, obi_low: float, obi_close: float) -> None: ...
|
||||
|
||||
def upsert_metric_bar(timestamp: int, obi_open: float, obi_high: float, obi_low: float, obi_close: float) -> None: ...
|
||||
|
||||
def clear_metrics() -> None: ...
|
||||
```
|
||||
|
||||
- Atomic writes via temp file replace to prevent partial reads.
|
||||
|
||||
## Visualization: app.py (Dash)
|
||||
|
||||
- Three visuals: OHLC+Volume and Depth (cumulative) with Plotly dark theme, plus an OBI candlestick subplot beneath Volume.
|
||||
- Polling interval: 500 ms. Tolerates JSON decode races using cached last values.
|
||||
|
||||
### Callback Contract
|
||||
```python
|
||||
@app.callback(
|
||||
[Output('ohlc-chart', 'figure'), Output('depth-chart', 'figure')],
|
||||
[Input('interval-update', 'n_intervals')]
|
||||
)
|
||||
```
|
||||
- Reads `ohlc_data.json` (list of `[ts, open, high, low, close, volume]`).
|
||||
- Reads `depth_data.json` (`{"bids": [[price, size], ...], "asks": [[price, size], ...]}`).
|
||||
- Reads `metrics_data.json` (list of `[ts, obi_o, obi_h, obi_l, obi_c]`).
|
||||
|
||||
## CLI Orchestration: main.py
|
||||
|
||||
### Typer Entry Point
|
||||
```python
|
||||
def main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None:
|
||||
"""Stream DBs, process OHLC/depth, and launch Dash visualizer in a separate process."""
|
||||
```
|
||||
|
||||
- Discovers databases under `../data/OKX` matching the instrument and date range.
|
||||
- Launches UI: `uv run python app.py`.
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Complete Processing Workflow
|
||||
|
||||
```python
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from storage import Storage
|
||||
from strategies import DefaultStrategy
|
||||
from visualizer import Visualizer
|
||||
|
||||
# Initialize components
|
||||
storage = Storage("BTC-USDT")
|
||||
strategy = DefaultStrategy("BTC-USDT")
|
||||
visualizer = Visualizer(window_seconds=60, max_bars=500)
|
||||
|
||||
# Process database
|
||||
db_path = Path("data/BTC-USDT-25-06-09.db")
|
||||
strategy.set_db_path(db_path)
|
||||
visualizer.set_db_path(db_path)
|
||||
|
||||
# Build book and calculate metrics
|
||||
storage.build_booktick_from_db(db_path, datetime.now())
|
||||
|
||||
# Analyze metrics
|
||||
strategy.on_booktick(storage.book)
|
||||
|
||||
# Update visualization
|
||||
visualizer.update_from_book(storage.book)
|
||||
visualizer.show()
|
||||
### Run processing + UI
|
||||
```bash
|
||||
uv run python main.py BTC-USDT 2025-07-01 2025-08-01 --window-seconds 60
|
||||
# Open http://localhost:8050
|
||||
```
|
||||
|
||||
### Metrics Analysis
|
||||
|
||||
### Process trades and update depth in a loop (conceptual)
|
||||
```python
|
||||
# Load and analyze stored metrics
|
||||
strategy = DefaultStrategy("BTC-USDT")
|
||||
strategy.set_db_path(Path("data.db"))
|
||||
from db_interpreter import DBInterpreter
|
||||
from ohlc_processor import OHLCProcessor
|
||||
|
||||
# Get metrics for specific time range
|
||||
metrics = strategy.load_stored_metrics(1640995200, 1640998800)
|
||||
|
||||
# Analyze metrics
|
||||
summary = strategy.get_metrics_summary(metrics)
|
||||
print(f"OBI Range: {summary['obi_min']:.3f} to {summary['obi_max']:.3f}")
|
||||
print(f"CVD Change: {summary['cvd_change']:.1f}")
|
||||
|
||||
# Find significant imbalances
|
||||
significant_obi = [m for m in metrics if abs(m.obi) > 0.2]
|
||||
print(f"Found {len(significant_obi)} snapshots with >20% imbalance")
|
||||
```
|
||||
|
||||
### Custom Metric Calculations
|
||||
|
||||
```python
|
||||
from models import MetricCalculator
|
||||
|
||||
# Calculate metrics for single snapshot
|
||||
obi = MetricCalculator.calculate_obi(snapshot)
|
||||
best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
|
||||
|
||||
# Calculate CVD over time
|
||||
cvd = 0.0
|
||||
for trades in trades_by_timestamp.values():
|
||||
volume_delta = MetricCalculator.calculate_volume_delta(trades)
|
||||
cvd = MetricCalculator.calculate_cvd(cvd, volume_delta)
|
||||
print(f"CVD: {cvd:.1f}")
|
||||
processor = OHLCProcessor(window_seconds=60)
|
||||
for ob_update, trades in DBInterpreter(db_path).stream():
|
||||
processor.process_trades(trades)
|
||||
processor.update_orderbook(ob_update)
|
||||
processor.finalize()
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
- Reader/Writer coordination via atomic JSON prevents partial reads.
|
||||
- Visualizer caches last valid data if JSON decoding fails mid-write; logs warnings.
|
||||
- Visualizer start failures do not stop processing; logs error and continues.
|
||||
|
||||
### Common Error Scenarios
|
||||
|
||||
#### Database Connection Issues
|
||||
```python
|
||||
try:
|
||||
repo = SQLiteOrderflowRepository(db_path)
|
||||
with repo.connect() as conn:
|
||||
metrics = repo.load_metrics_by_timerange(conn, start, end)
|
||||
except sqlite3.Error as e:
|
||||
logging.error(f"Database error: {e}")
|
||||
metrics = [] # Fallback to empty list
|
||||
```
|
||||
|
||||
#### Missing Metrics Table
|
||||
```python
|
||||
repo = SQLiteOrderflowRepository(db_path)
|
||||
with repo.connect() as conn:
|
||||
if not repo.table_exists(conn, "metrics"):
|
||||
repo.create_metrics_table(conn)
|
||||
logging.info("Created metrics table")
|
||||
```
|
||||
|
||||
#### Empty Data Handling
|
||||
```python
|
||||
# All methods handle empty data gracefully
|
||||
obi = MetricCalculator.calculate_obi(empty_snapshot) # Returns 0.0
|
||||
vd = MetricCalculator.calculate_volume_delta([]) # Returns 0.0
|
||||
summary = strategy.get_metrics_summary([]) # Returns {}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
This API documentation provides complete coverage of the public interfaces for the Orderflow Backtest System. For implementation details and architecture information, see the additional documentation in the `docs/` directory.
|
||||
## Notes
|
||||
- Metrics computation includes simplified OBI (Order Book Imbalance) calculated as bid_total - ask_total. Repository/storage layers and strategy APIs are intentionally kept minimal.
|
||||
|
||||
@@ -5,42 +5,52 @@ All notable changes to the Orderflow Backtest System are documented in this file
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [2.0.0] - 2024-Current
|
||||
## [Unreleased]
|
||||
|
||||
### Added
|
||||
- **OBI Metrics Calculation**: Order Book Imbalance calculation with formula `(Vb - Va) / (Vb + Va)`
|
||||
- **CVD Metrics Calculation**: Cumulative Volume Delta with incremental calculation and reset functionality
|
||||
- **Persistent Metrics Storage**: SQLite-based storage for calculated metrics to avoid recalculation
|
||||
- **Memory Optimization**: >70% reduction in peak memory usage through streaming processing
|
||||
- **Enhanced Visualization**: Multi-subplot charts with OHLC, Volume, OBI, and CVD displays
|
||||
- **MetricCalculator Class**: Static methods for financial metrics computation
|
||||
- **Batch Processing**: High-performance batch inserts (1000 records per operation)
|
||||
- **Time-Range Queries**: Efficient metrics retrieval for specified time periods
|
||||
- **Strategy Enhancement**: Metrics analysis capabilities in `DefaultStrategy`
|
||||
- **Comprehensive Testing**: 27 tests across 6 test files with full integration coverage
|
||||
- Comprehensive documentation structure with module-specific guides
|
||||
- Architecture Decision Records (ADRs) for major technical decisions
|
||||
- CONTRIBUTING.md with development guidelines and standards
|
||||
- Enhanced module documentation in `docs/modules/` directory
|
||||
- Dependency documentation with security and performance considerations
|
||||
|
||||
### Changed
|
||||
- **Storage Architecture**: Modified `Storage.build_booktick_from_db()` to integrate metrics calculation
|
||||
- **Visualization Separation**: Moved visualization from strategy to main application for better separation of concerns
|
||||
- **Strategy Interface**: Simplified `DefaultStrategy` constructor (removed `enable_visualization` parameter)
|
||||
- **Main Application Flow**: Enhanced orchestration with per-database visualization updates
|
||||
- **Database Schema**: Auto-creation of metrics table with proper indexes and foreign key constraints
|
||||
- **Memory Management**: Stream processing instead of keeping full snapshot history
|
||||
- Documentation structure reorganized to follow documentation standards
|
||||
- Improved code documentation requirements with examples
|
||||
- Enhanced testing guidelines with coverage requirements
|
||||
|
||||
### Improved
|
||||
- **Performance**: Batch database operations and optimized SQLite PRAGMAs
|
||||
- **Scalability**: Support for months to years of high-frequency trading data
|
||||
- **Code Quality**: All functions <50 lines, all files <250 lines
|
||||
- **Documentation**: Comprehensive module and API documentation
|
||||
- **Error Handling**: Graceful degradation and comprehensive logging
|
||||
- **Type Safety**: Full type annotations throughout codebase
|
||||
## [2.0.0] - 2024-12-Present
|
||||
|
||||
### Added
|
||||
- **Simplified Pipeline Architecture**: Streamlined SQLite → OHLC/Depth → JSON → Dash pipeline
|
||||
- **JSON-based IPC**: Atomic file-based communication between processor and visualizer
|
||||
- **Real-time Visualization**: Dash web application with 500ms polling updates
|
||||
- **OHLC Aggregation**: Configurable time window aggregation with throttled updates
|
||||
- **Orderbook Depth**: Real-time depth snapshots with top-N level management
|
||||
- **OBI Metrics**: Order Book Imbalance calculation with candlestick visualization
|
||||
- **Atomic JSON Operations**: Race-condition-free data exchange via temp files
|
||||
- **CLI Orchestration**: Typer-based command interface with process management
|
||||
- **Performance Optimizations**: Batch reading with optimized SQLite PRAGMA settings
|
||||
|
||||
### Changed
|
||||
- **Architecture Simplification**: Removed complex repository/storage layers
|
||||
- **Data Flow**: Direct streaming from database to visualization via JSON
|
||||
- **Error Handling**: Graceful degradation with cached data fallbacks
|
||||
- **Process Management**: Separate visualization process launched automatically
|
||||
- **Memory Efficiency**: Bounded datasets prevent unlimited memory growth
|
||||
|
||||
### Technical Details
|
||||
- **New Tables**: `metrics` table with indexes on timestamp and snapshot_id
|
||||
- **New Models**: `Metric` dataclass for calculated values
|
||||
- **Processing Pipeline**: Snapshot → Calculate → Store → Discard workflow
|
||||
- **Query Interface**: Time-range based metrics retrieval
|
||||
- **Visualization Layout**: 4-subplot layout with shared time axis
|
||||
- **Database Access**: Read-only SQLite with immutable mode and mmap optimization
|
||||
- **Batch Sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal performance
|
||||
- **JSON Formats**: Standardized schemas for OHLC, depth, and metrics data
|
||||
- **Chart Architecture**: Multi-subplot layout with shared time axis
|
||||
- **IPC Files**: `ohlc_data.json`, `depth_data.json`, `metrics_data.json`
|
||||
|
||||
### Removed
|
||||
- Complex metrics storage and repository patterns
|
||||
- Strategy framework components
|
||||
- In-memory snapshot retention
|
||||
- Multi-database orchestration complexity
|
||||
|
||||
## [1.0.0] - Previous Version
|
||||
|
||||
|
||||
186
docs/CONTEXT.md
186
docs/CONTEXT.md
@@ -2,162 +2,52 @@
|
||||
|
||||
## Current State
|
||||
|
||||
The Orderflow Backtest System has successfully implemented a comprehensive OBI (Order Book Imbalance) and CVD (Cumulative Volume Delta) metrics calculation and visualization system. The project is in a production-ready state with full feature completion.
|
||||
The project implements a modular, efficient orderflow processing pipeline:
|
||||
- Stream orderflow from SQLite (`DBInterpreter.stream`).
|
||||
- Process trades and orderbook updates through modular `OHLCProcessor` architecture.
|
||||
- Exchange data with the UI via atomic JSON files (`viz_io`).
|
||||
- Render OHLC+Volume, Depth, and Metrics charts with a Dash app (`app.py`).
|
||||
|
||||
## Recent Achievements
|
||||
The system features a clean composition-based architecture with specialized modules for different concerns, providing OBI/CVD metrics alongside OHLC data.
|
||||
|
||||
### ✅ Completed Features (Latest Implementation)
|
||||
- **Metrics Calculation Engine**: Complete OBI and CVD calculation with per-snapshot granularity
|
||||
- **Persistent Storage**: Metrics stored in SQLite database to avoid recalculation
|
||||
- **Memory Optimization**: >70% memory usage reduction through efficient data management
|
||||
- **Visualization System**: Multi-subplot charts (OHLC, Volume, OBI, CVD) with shared time axis
|
||||
- **Strategy Framework**: Enhanced trading strategy system with metrics analysis
|
||||
- **Clean Architecture**: Proper separation of concerns between data, analysis, and visualization
|
||||
## Recent Work
|
||||
|
||||
### 📊 System Metrics
|
||||
- **Performance**: Batch processing of 1000 records per operation
|
||||
- **Memory**: >70% reduction in peak memory usage
|
||||
- **Test Coverage**: 27 comprehensive tests across 6 test files
|
||||
- **Code Quality**: All functions <50 lines, all files <250 lines
|
||||
- **Modular Refactoring**: Extracted `ohlc_processor.py` into focused modules:
|
||||
- `level_parser.py`: Orderbook level parsing utilities (85 lines)
|
||||
- `orderbook_manager.py`: In-memory orderbook state management (90 lines)
|
||||
- `metrics_calculator.py`: OBI and CVD metrics calculation (112 lines)
|
||||
- **Architecture Compliance**: Reduced main processor from 440 to 248 lines (250-line target achieved)
|
||||
- Maintained full backward compatibility and functionality
|
||||
- Implemented read-only, batched SQLite streaming with PRAGMA tuning.
|
||||
- Added robust JSON IPC with atomic writes and tolerant UI reads.
|
||||
- Built a responsive Dash visualization polling at 500ms.
|
||||
- Unified CLI using Typer, with UV for process management.
|
||||
|
||||
## Architecture Decisions
|
||||
## Conventions
|
||||
|
||||
### Key Design Patterns
|
||||
1. **Repository Pattern**: Clean separation between data access and business logic
|
||||
2. **Dataclass Models**: Lightweight, type-safe data structures with slots optimization
|
||||
3. **Batch Processing**: High-performance database operations for large datasets
|
||||
4. **Separation of Concerns**: Strategy, Storage, and Visualization as independent components
|
||||
- Python 3.12+, UV for dependency and command execution.
|
||||
- **Modular Architecture**: Composition over inheritance, single-responsibility modules
|
||||
- **File Size Limits**: ≤250 lines per file, ≤50 lines per function (enforced)
|
||||
- Type hints throughout; concise, focused functions and classes.
|
||||
- Error handling with meaningful logs; avoid bare exceptions.
|
||||
- Prefer explicit JSON structures for IPC; keep payloads small and bounded.
|
||||
|
||||
### Technology Stack
|
||||
- **Language**: Python 3.12+ with type hints
|
||||
- **Database**: SQLite with optimized PRAGMAs for performance
|
||||
- **Package Management**: UV for fast dependency resolution
|
||||
- **Testing**: Pytest with comprehensive unit and integration tests
|
||||
- **Visualization**: Matplotlib with Qt5Agg backend
|
||||
## Priorities
|
||||
|
||||
## Current Development Priorities
|
||||
- Improve configurability: database path discovery, CLI flags for paths and UI options.
|
||||
- Add tests for `DBInterpreter.stream` and `OHLCProcessor` (run with `uv run pytest`).
|
||||
- Performance tuning for large DBs while keeping UI responsive.
|
||||
- Documentation kept in sync with code; architecture reflects current design.
|
||||
|
||||
### ✅ Completed (Production Ready)
|
||||
1. **Core Metrics System**: OBI and CVD calculation infrastructure
|
||||
2. **Database Integration**: Persistent storage and retrieval system
|
||||
3. **Visualization Framework**: Multi-chart display with proper time alignment
|
||||
4. **Memory Optimization**: Efficient processing of large datasets
|
||||
5. **Code Quality**: Comprehensive testing and documentation
|
||||
## Roadmap (Future Work)
|
||||
|
||||
### 🔄 Maintenance Phase
|
||||
- **Documentation**: Comprehensive docs completed
|
||||
- **Testing**: Full test coverage maintained
|
||||
- **Performance**: Monitoring and optimization as needed
|
||||
- **Bug Fixes**: Address any issues discovered in production use
|
||||
- Enhance OBI metrics with additional derived calculations (e.g., normalized OBI).
|
||||
- Optional repository layer abstraction and a storage orchestrator.
|
||||
- Extend visualization with additional subplots and interactivity.
|
||||
- Strategy module for analytics and alerting on derived metrics.
|
||||
|
||||
## Known Patterns and Conventions
|
||||
## Tooling
|
||||
|
||||
### Code Style
|
||||
- **Functions**: Maximum 50 lines, single responsibility
|
||||
- **Files**: Maximum 250 lines, clear module boundaries
|
||||
- **Naming**: Descriptive names, no abbreviations except domain terms (OBI, CVD)
|
||||
- **Error Handling**: Comprehensive try-catch with logging, graceful degradation
|
||||
|
||||
### Database Patterns
|
||||
- **Parameterized Queries**: All SQL uses proper parameterization for security
|
||||
- **Batch Operations**: Process records in batches of 1000 for performance
|
||||
- **Indexing**: Strategic indexes on timestamp and foreign key columns
|
||||
- **Transactions**: Proper transaction boundaries for data consistency
|
||||
|
||||
### Testing Patterns
|
||||
- **Unit Tests**: Each module has comprehensive unit test coverage
|
||||
- **Integration Tests**: End-to-end workflow testing
|
||||
- **Mock Objects**: External dependencies mocked for isolated testing
|
||||
- **Test Data**: Temporary databases with realistic test data
|
||||
|
||||
## Integration Points
|
||||
|
||||
### External Dependencies
|
||||
- **SQLite**: Primary data storage (read and write operations)
|
||||
- **Matplotlib**: Chart rendering and visualization
|
||||
- **Qt5Agg**: GUI backend for interactive charts
|
||||
- **Pytest**: Testing framework
|
||||
|
||||
### Internal Module Dependencies
|
||||
```
|
||||
main.py → storage.py → repositories/ → models.py
|
||||
→ strategies.py → models.py
|
||||
→ visualizer.py → repositories/
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Optimizations Implemented
|
||||
- **Memory Management**: Metrics storage instead of full snapshot retention
|
||||
- **Database Performance**: Optimized SQLite PRAGMAs and batch processing
|
||||
- **Query Efficiency**: Indexed queries with proper WHERE clauses
|
||||
- **Cache Usage**: Price caching in orderbook parser for repeated calculations
|
||||
|
||||
### Scalability Notes
|
||||
- **Dataset Size**: Tested with 600K+ snapshots and 300K+ trades per day
|
||||
- **Time Range**: Supports months to years of historical data
|
||||
- **Processing Speed**: ~1000 rows/second with full metrics calculation
|
||||
- **Storage Overhead**: Metrics table adds <20% to original database size
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Implemented Safeguards
|
||||
- **SQL Injection Prevention**: All queries use parameterized statements
|
||||
- **Input Validation**: Database paths and table names validated
|
||||
- **Error Information**: No sensitive data exposed in error messages
|
||||
- **Access Control**: Database file permissions respected
|
||||
|
||||
## Future Considerations
|
||||
|
||||
### Potential Enhancements
|
||||
- **Real-time Processing**: Streaming data support for live trading
|
||||
- **Additional Metrics**: Volume Profile, Delta Flow, Liquidity metrics
|
||||
- **Export Capabilities**: CSV/JSON export for external analysis
|
||||
- **Interactive Charts**: Enhanced user interaction with visualization
|
||||
- **Configuration System**: Configurable batch sizes and processing parameters
|
||||
|
||||
### Scalability Options
|
||||
- **Database Upgrade**: PostgreSQL for larger datasets if needed
|
||||
- **Parallel Processing**: Multi-threading for CPU-intensive calculations
|
||||
- **Caching Layer**: Redis for frequently accessed metrics
|
||||
- **API Interface**: REST API for external system integration
|
||||
|
||||
## Development Environment
|
||||
|
||||
### Requirements
|
||||
- Python 3.12+
|
||||
- UV package manager
|
||||
- SQLite database files with required schema
|
||||
- Qt5 for visualization (Linux/macOS)
|
||||
|
||||
### Setup Commands
|
||||
```bash
|
||||
# Install dependencies
|
||||
uv sync
|
||||
|
||||
# Run full test suite
|
||||
uv run pytest
|
||||
|
||||
# Process sample data
|
||||
uv run python main.py BTC-USDT 2025-07-01 2025-08-01
|
||||
```
|
||||
|
||||
## Documentation Status
|
||||
|
||||
### ✅ Complete Documentation
|
||||
- README.md with comprehensive overview
|
||||
- Module-level documentation for all components
|
||||
- API documentation with examples
|
||||
- Architecture decision records
|
||||
- Code-level documentation with docstrings
|
||||
|
||||
### 📊 Quality Metrics
|
||||
- **Code Coverage**: 27 tests across 6 test files
|
||||
- **Documentation Coverage**: All public interfaces documented
|
||||
- **Example Coverage**: Working examples for all major features
|
||||
- **Error Documentation**: All error conditions documented
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: Current as of OBI/CVD metrics system completion*
|
||||
*Next Review: As needed for maintenance or feature additions*
|
||||
- Package management and commands: UV (e.g., `uv sync`, `uv run ...`).
|
||||
- Visualization server: Dash on `http://localhost:8050`.
|
||||
- Linting/testing: Pytest (e.g., `uv run pytest`).
|
||||
|
||||
@@ -2,50 +2,25 @@
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains comprehensive documentation for the Orderflow Backtest System, a high-performance cryptocurrency trading data analysis platform.
|
||||
This directory contains documentation for the current Orderflow Backtest System, which streams historical orderflow from SQLite, aggregates OHLC bars, maintains a lightweight depth snapshot, and renders charts via a Dash web application.
|
||||
|
||||
## Documentation Structure
|
||||
|
||||
### 📚 Main Documentation
|
||||
- **[CONTEXT.md](./CONTEXT.md)**: Current project state, architecture decisions, and development patterns
|
||||
- **[architecture.md](./architecture.md)**: System architecture, component relationships, and data flow
|
||||
- **[API.md](./API.md)**: Public interfaces, classes, and function documentation
|
||||
|
||||
### 📦 Module Documentation
|
||||
- **[modules/metrics.md](./modules/metrics.md)**: OBI and CVD calculation system
|
||||
- **[modules/storage.md](./modules/storage.md)**: Data processing and persistence layer
|
||||
- **[modules/visualization.md](./modules/visualization.md)**: Chart rendering and display system
|
||||
- **[modules/repositories.md](./modules/repositories.md)**: Database access and operations
|
||||
|
||||
### 🏗️ Architecture Decisions
|
||||
- **[decisions/ADR-001-metrics-storage.md](./decisions/ADR-001-metrics-storage.md)**: Persistent metrics storage decision
|
||||
- **[decisions/ADR-002-visualization-separation.md](./decisions/ADR-002-visualization-separation.md)**: Separation of concerns for visualization
|
||||
|
||||
### 📋 Development Guides
|
||||
- **[CONTRIBUTING.md](./CONTRIBUTING.md)**: Development workflow and contribution guidelines
|
||||
- **[CHANGELOG.md](./CHANGELOG.md)**: Version history and changes
|
||||
- `architecture.md`: System architecture, component relationships, and data flow (SQLite → Streaming → OHLC/Depth → JSON → Dash)
|
||||
- `API.md`: Public interfaces for DB streaming, OHLC/depth processing, JSON IPC, Dash visualization, and CLI
|
||||
- `CONTEXT.md`: Project state, conventions, and development priorities
|
||||
- `decisions/`: Architecture decision records
|
||||
|
||||
## Quick Navigation
|
||||
|
||||
| Topic | Documentation |
|
||||
|-------|---------------|
|
||||
| **Getting Started** | [README.md](../README.md) |
|
||||
| **System Architecture** | [architecture.md](./architecture.md) |
|
||||
| **Metrics Calculation** | [modules/metrics.md](./modules/metrics.md) |
|
||||
| **Database Schema** | [API.md](./API.md#database-schema) |
|
||||
| **Development Setup** | [CONTRIBUTING.md](./CONTRIBUTING.md) |
|
||||
| **API Reference** | [API.md](./API.md) |
|
||||
| Getting Started | See the usage examples in `API.md` |
|
||||
| System Architecture | `architecture.md` |
|
||||
| Database Schema | `API.md#input-database-schema-required` |
|
||||
| Development Setup | Project root `README` and `pyproject.toml` |
|
||||
|
||||
## Documentation Standards
|
||||
## Notes
|
||||
|
||||
This documentation follows the project's documentation standards defined in `.cursor/rules/documentation.mdc`. All documentation includes:
|
||||
|
||||
- Clear purpose and scope
|
||||
- Code examples with working implementations
|
||||
- API documentation with request/response formats
|
||||
- Error handling and edge cases
|
||||
- Dependencies and requirements
|
||||
|
||||
## Maintenance
|
||||
|
||||
Documentation is updated with every significant code change and reviewed during the development process. See [CONTRIBUTING.md](./CONTRIBUTING.md) for details on documentation maintenance procedures.
|
||||
- Metrics (OBI/CVD), repository/storage layers, and strategy components have been removed from the current codebase and are planned as future enhancements.
|
||||
- Use UV for package management and running commands. Example: `uv run python main.py ...`.
|
||||
|
||||
@@ -2,303 +2,155 @@
|
||||
|
||||
## Overview
|
||||
|
||||
The Orderflow Backtest System is designed as a modular, high-performance data processing pipeline for cryptocurrency trading analysis. The architecture emphasizes separation of concerns, efficient memory usage, and scalable processing of large datasets.
|
||||
The current system is a streamlined, high-performance pipeline that streams orderflow from SQLite databases, aggregates trades into OHLC bars, maintains a lightweight depth snapshot, and serves visuals via a Dash web application. Inter-process communication (IPC) between the processor and visualizer uses atomic JSON files for simplicity and robustness.
|
||||
|
||||
## High-Level Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
||||
│ Data Sources │ │ Processing │ │ Presentation │
|
||||
│ │ │ │ │ │
|
||||
│ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌─────────────┐ │
|
||||
│ │SQLite Files │─┼────┼→│ Storage │─┼────┼→│ Visualizer │ │
|
||||
│ │- orderbook │ │ │ │- Orchestrator│ │ │ │- OHLC Charts│ │
|
||||
│ │- trades │ │ │ │- Calculator │ │ │ │- OBI/CVD │ │
|
||||
│ └─────────────┘ │ │ └──────────────┘ │ │ └─────────────┘ │
|
||||
│ │ │ │ │ │ ▲ │
|
||||
└─────────────────┘ │ ┌─────────────┐ │ │ ┌─────────────┐ │
|
||||
│ │ Strategy │──┼────┼→│ Reports │ │
|
||||
│ │- Analysis │ │ │ │- Metrics │ │
|
||||
│ │- Alerts │ │ │ │- Summaries │ │
|
||||
│ └─────────────┘ │ │ └─────────────┘ │
|
||||
└──────────────────┘ └─────────────────┘
|
||||
┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ ┌──────────────────┐
|
||||
│ SQLite Files │ → │ DB Interpreter │ → │ OHLC/Depth │ → │ Dash Visualizer │
|
||||
│ (book,trades) │ │ (stream rows) │ │ Processor │ │ (app.py) │
|
||||
└─────────────────┘ └─────────────────────┘ └─────────┬────────┘ └────────────▲─────┘
|
||||
│ │
|
||||
│ Atomic JSON (IPC) │
|
||||
▼ │
|
||||
ohlc_data.json, depth_data.json │
|
||||
metrics_data.json │
|
||||
│
|
||||
Browser UI
|
||||
```
|
||||
|
||||
## Component Architecture
|
||||
## Components
|
||||
|
||||
### Data Layer
|
||||
### Data Access (`db_interpreter.py`)
|
||||
|
||||
#### Models (`models.py`)
|
||||
**Purpose**: Core data structures and calculation logic
|
||||
- `OrderbookLevel`: dataclass representing one price level.
|
||||
- `OrderbookUpdate`: container for a book row window with `bids`, `asks`, `timestamp`, and `end_timestamp`.
|
||||
- `DBInterpreter`:
|
||||
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]` streams the book table with lookahead and the trades table in timestamp order.
|
||||
- Efficient read-only connection with PRAGMA tuning: immutable mode, query_only, temp_store=MEMORY, mmap_size, cache_size.
|
||||
- Batching constants: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`.
|
||||
- Each yielded `trades` element is a tuple `(id, trade_id, price, size, side, timestamp_ms)` that falls within `[book.timestamp, next_book.timestamp)`.
|
||||
|
||||
```python
|
||||
# Core data models
|
||||
OrderbookLevel # Single price level (price, size, order_count, liquidation_count)
|
||||
Trade # Individual trade execution (price, size, side, timestamp)
|
||||
BookSnapshot # Complete orderbook state at timestamp
|
||||
Book # Container for snapshot sequence
|
||||
Metric # Calculated OBI/CVD values
|
||||
### Processing (Modular Architecture)
|
||||
|
||||
# Calculation engine
|
||||
MetricCalculator # Static methods for OBI/CVD computation
|
||||
```
|
||||
#### Main Coordinator (`ohlc_processor.py`)
|
||||
- `OHLCProcessor(window_seconds=60, depth_levels_per_side=50)`: Orchestrates trade processing using composition
|
||||
- `process_trades(trades)`: aggregates trades into OHLC bars and delegates CVD updates
|
||||
- `update_orderbook(ob_update)`: coordinates orderbook updates and OBI metric calculation
|
||||
- `finalize()`: finalizes both OHLC bars and metrics data
|
||||
- `cvd_cumulative` (property): provides access to cumulative volume delta
|
||||
|
||||
**Relationships**:
|
||||
- `Book` contains multiple `BookSnapshot` instances
|
||||
- `BookSnapshot` contains dictionaries of `OrderbookLevel` and lists of `Trade`
|
||||
- `Metric` stores calculated values for each `BookSnapshot`
|
||||
- `MetricCalculator` operates on snapshots to produce metrics
|
||||
#### Orderbook Management (`orderbook_manager.py`)
|
||||
- `OrderbookManager`: Handles in-memory orderbook state with partial updates
|
||||
- Maintains separate bid/ask price→size dictionaries
|
||||
- Supports deletions via zero-size updates
|
||||
- Provides sorted top-N level extraction for visualization
|
||||
|
||||
#### Repositories (`repositories/`)
|
||||
**Purpose**: Database access and persistence layer
|
||||
#### Metrics Calculation (`metrics_calculator.py`)
|
||||
- `MetricsCalculator`: Manages OBI and CVD metrics with windowed aggregation
|
||||
- Tracks CVD from trade flow (buy vs sell volume delta)
|
||||
- Calculates OBI from orderbook volume imbalance
|
||||
- Provides throttled updates and OHLC-style metric bars
|
||||
|
||||
```python
|
||||
# Repository
|
||||
SQLiteOrderflowRepository:
|
||||
- connect() # Optimized SQLite connection
|
||||
- load_trades_by_timestamp() # Efficient trade loading
|
||||
- iterate_book_rows() # Memory-efficient snapshot streaming
|
||||
- count_rows() # Performance monitoring
|
||||
- create_metrics_table() # Schema creation
|
||||
- insert_metrics_batch() # High-performance batch inserts
|
||||
- load_metrics_by_timerange() # Time-range queries
|
||||
- table_exists() # Schema validation
|
||||
```
|
||||
#### Level Parsing (`level_parser.py`)
|
||||
- Utility functions for normalizing orderbook level data:
|
||||
- `normalize_levels()`: parses levels, filtering zero/negative sizes
|
||||
- `parse_levels_including_zeros()`: preserves zeros for deletion operations
|
||||
- Supports JSON and Python literal formats with robust error handling
|
||||
|
||||
**Design Patterns**:
|
||||
- **Repository Pattern**: Clean separation between data access and business logic
|
||||
- **Batch Processing**: Process 1000 records per database operation
|
||||
- **Connection Management**: Caller manages connection lifecycle
|
||||
- **Performance Optimization**: SQLite PRAGMAs for high-speed operations
|
||||
### Inter-Process Communication (`viz_io.py`)
|
||||
|
||||
### Processing Layer
|
||||
- File paths (relative to project root):
|
||||
- `ohlc_data.json`: rolling list of OHLC bars (max 1000).
|
||||
- `depth_data.json`: latest depth snapshot (bids/asks).
|
||||
- `metrics_data.json`: rolling list of OBI/TOT OHLC bars (max 1000).
|
||||
- Atomic writes via temp files prevent partial reads by the Dash app.
|
||||
- API:
|
||||
- `add_ohlc_bar(...)`: append a new bar; trim to last 1000.
|
||||
- `upsert_ohlc_bar(...)`: replace last bar if timestamp matches; else append; trim.
|
||||
- `clear_data()`: reset OHLC data to an empty list.
|
||||
|
||||
#### Storage (`storage.py`)
|
||||
**Purpose**: Orchestrates data loading, processing, and metrics calculation
|
||||
### Visualization (`app.py`)
|
||||
|
||||
```python
|
||||
class Storage:
|
||||
- build_booktick_from_db() # Main processing pipeline
|
||||
- _create_snapshots_and_metrics() # Per-snapshot processing
|
||||
- _snapshot_from_row() # Individual snapshot creation
|
||||
```
|
||||
- Dash application with two graphs plus OBI subplot:
|
||||
- OHLC + Volume subplot with shared x-axis.
|
||||
- OBI candlestick subplot (blue tones) sharing x-axis.
|
||||
- Depth (cumulative) chart for bids and asks.
|
||||
- Polling interval (500 ms) callback reads JSON files and updates figures resilently:
|
||||
- Caches last good values to tolerate in-flight writes/decoding errors.
|
||||
- Builds figures with Plotly dark theme.
|
||||
- Exposed on `http://localhost:8050` by default (`host=0.0.0.0`).
|
||||
|
||||
**Processing Pipeline**:
|
||||
1. **Initialize**: Create metrics repository and table if needed
|
||||
2. **Load Trades**: Group trades by timestamp for efficient access
|
||||
3. **Stream Processing**: Process snapshots one-by-one to minimize memory
|
||||
4. **Calculate Metrics**: OBI and CVD calculation per snapshot
|
||||
5. **Batch Persistence**: Store metrics in batches of 1000
|
||||
6. **Memory Management**: Discard full snapshots after metric extraction
|
||||
### CLI Orchestration (`main.py`)
|
||||
|
||||
#### Strategy Framework (`strategies.py`)
|
||||
**Purpose**: Trading analysis and signal generation
|
||||
|
||||
```python
|
||||
class DefaultStrategy:
|
||||
- set_db_path() # Configure database access
|
||||
- compute_OBI() # Real-time OBI calculation (fallback)
|
||||
- load_stored_metrics() # Retrieve persisted metrics
|
||||
- get_metrics_summary() # Statistical analysis
|
||||
- on_booktick() # Main analysis entry point
|
||||
```
|
||||
|
||||
**Analysis Capabilities**:
|
||||
- **Stored Metrics**: Primary analysis using persisted data
|
||||
- **Real-time Fallback**: Live calculation for compatibility
|
||||
- **Statistical Summaries**: Min/max/average OBI, CVD changes
|
||||
- **Alert System**: Configurable thresholds for significant imbalances
|
||||
|
||||
### Presentation Layer
|
||||
|
||||
#### Visualization (`visualizer.py`)
|
||||
**Purpose**: Multi-chart rendering and display
|
||||
|
||||
```python
|
||||
class Visualizer:
|
||||
- set_db_path() # Configure metrics access
|
||||
- update_from_book() # Main rendering pipeline
|
||||
- _load_stored_metrics() # Retrieve metrics for chart range
|
||||
- _draw() # Multi-subplot rendering
|
||||
- show() # Display interactive charts
|
||||
```
|
||||
|
||||
**Chart Layout**:
|
||||
```
|
||||
┌─────────────────────────────────────┐
|
||||
│ OHLC Candlesticks │ ← Price action
|
||||
├─────────────────────────────────────┤
|
||||
│ Volume Bars │ ← Trading volume
|
||||
├─────────────────────────────────────┤
|
||||
│ OBI Line Chart │ ← Order book imbalance
|
||||
├─────────────────────────────────────┤
|
||||
│ CVD Line Chart │ ← Cumulative volume delta
|
||||
└─────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- **Shared Time Axis**: Synchronized X-axis across all subplots
|
||||
- **Auto-scaling**: Y-axis optimization for each metric type
|
||||
- **Performance**: Efficient rendering of large datasets
|
||||
- **Interactive**: Qt5Agg backend for zooming and panning
|
||||
- Typer CLI entrypoint:
|
||||
- Arguments: `instrument`, `start_date`, `end_date` (UTC, `YYYY-MM-DD`), options: `--window-seconds`.
|
||||
- Discovers SQLite files under `../data/OKX` matching the instrument.
|
||||
- Launches Dash visualizer as a separate process: `uv run python app.py`.
|
||||
- Streams databases sequentially: for each book row, processes trades and updates orderbook.
|
||||
|
||||
## Data Flow
|
||||
|
||||
### Processing Flow
|
||||
```
|
||||
1. SQLite DB → Repository → Raw Data
|
||||
2. Raw Data → Storage → BookSnapshot
|
||||
3. BookSnapshot → MetricCalculator → OBI/CVD
|
||||
4. Metrics → Repository → Database Storage
|
||||
5. Stored Metrics → Strategy → Analysis
|
||||
6. Stored Metrics → Visualizer → Charts
|
||||
```
|
||||
1. Discover and open SQLite database(s) for the requested instrument.
|
||||
2. Stream `book` rows with one-row lookahead to form time windows.
|
||||
3. Stream `trades` in timestamp order and bucket into the active window.
|
||||
4. For each window:
|
||||
- Aggregate trades into OHLC using `OHLCProcessor.process_trades`.
|
||||
- Apply partial depth updates via `OHLCProcessor.update_orderbook` and emit periodic snapshots.
|
||||
5. Persist current OHLC bar(s) and depth snapshots to JSON via atomic writes.
|
||||
6. Dash app polls JSON and renders charts.
|
||||
|
||||
### Memory Management Flow
|
||||
```
|
||||
Traditional: DB → All Snapshots in Memory → Analysis (High Memory)
|
||||
Optimized: DB → Process Snapshot → Calculate Metrics → Store → Discard (Low Memory)
|
||||
```
|
||||
## IPC JSON Schemas
|
||||
|
||||
## Database Schema
|
||||
- OHLC (`ohlc_data.json`): array of bars; each bar is `[ts, open, high, low, close, volume]`.
|
||||
|
||||
### Input Schema (Required)
|
||||
```sql
|
||||
-- Orderbook snapshots
|
||||
CREATE TABLE book (
|
||||
id INTEGER PRIMARY KEY,
|
||||
instrument TEXT,
|
||||
bids TEXT, -- JSON: [[price, size, liq_count, order_count], ...]
|
||||
asks TEXT, -- JSON: [[price, size, liq_count, order_count], ...]
|
||||
timestamp TEXT
|
||||
);
|
||||
- Depth (`depth_data.json`): object with bids/asks arrays: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`.
|
||||
|
||||
-- Trade executions
|
||||
CREATE TABLE trades (
|
||||
id INTEGER PRIMARY KEY,
|
||||
instrument TEXT,
|
||||
trade_id TEXT,
|
||||
price REAL,
|
||||
size REAL,
|
||||
side TEXT, -- "buy" or "sell"
|
||||
timestamp TEXT
|
||||
);
|
||||
```
|
||||
- Metrics (`metrics_data.json`): array of bars; each bar is `[ts, obi_open, obi_high, obi_low, obi_close, tot_open, tot_high, tot_low, tot_close]`.
|
||||
|
||||
### Output Schema (Auto-created)
|
||||
```sql
|
||||
-- Calculated metrics
|
||||
CREATE TABLE metrics (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
snapshot_id INTEGER,
|
||||
timestamp TEXT,
|
||||
obi REAL, -- Order Book Imbalance [-1, 1]
|
||||
cvd REAL, -- Cumulative Volume Delta
|
||||
best_bid REAL,
|
||||
best_ask REAL,
|
||||
FOREIGN KEY (snapshot_id) REFERENCES book(id)
|
||||
);
|
||||
## Configuration
|
||||
|
||||
-- Performance indexes
|
||||
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
|
||||
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
|
||||
```
|
||||
- `OHLCProcessor(window_seconds, depth_levels_per_side)` controls aggregation granularity and depth snapshot size.
|
||||
- Visualizer interval (`500 ms`) balances UI responsiveness and CPU usage.
|
||||
- Paths: JSON files (`ohlc_data.json`, `depth_data.json`) are colocated with the code and written atomically.
|
||||
- CLI parameters select instrument and time range; databases expected under `../data/OKX`.
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Memory Optimization
|
||||
- **Before**: Store all snapshots in memory (~1GB for 600K snapshots)
|
||||
- **After**: Store only metrics data (~300MB for same dataset)
|
||||
- **Reduction**: >70% memory usage decrease
|
||||
- Read-only SQLite tuned for fast sequential scans: immutable URI, query_only, large mmap and cache.
|
||||
- Batching minimizes cursor churn and Python overhead.
|
||||
- JSON IPC uses atomic replace to avoid contention; OHLC list is bounded to 1000 entries.
|
||||
- Processor throttles intra-window OHLC upserts and depth emissions to reduce I/O.
|
||||
|
||||
### Processing Performance
|
||||
- **Batch Size**: 1000 records per database operation
|
||||
- **Processing Speed**: ~1000 snapshots/second on modern hardware
|
||||
- **Database Overhead**: <20% storage increase for metrics table
|
||||
- **Query Performance**: Sub-second retrieval for typical time ranges
|
||||
## Error Handling
|
||||
|
||||
### Scalability Limits
|
||||
- **Single File**: 1M+ snapshots per database file
|
||||
- **Time Range**: Months to years of historical data
|
||||
- **Memory Peak**: <2GB for year-long datasets
|
||||
- **Disk Space**: Original size + 20% for metrics
|
||||
|
||||
## Integration Points
|
||||
|
||||
### External Interfaces
|
||||
```python
|
||||
# Main application entry point
|
||||
main.py:
|
||||
- CLI argument parsing
|
||||
- Database file discovery
|
||||
- Component orchestration
|
||||
- Progress monitoring
|
||||
|
||||
# Plugin interfaces
|
||||
Strategy.on_booktick(book: Book) # Strategy integration point
|
||||
Visualizer.update_from_book(book) # Visualization integration
|
||||
```
|
||||
|
||||
### Internal Interfaces
|
||||
```python
|
||||
# Repository interfaces
|
||||
Repository.connect() → Connection
|
||||
Repository.load_data() → TypedData
|
||||
Repository.store_data(data) → None
|
||||
|
||||
# Calculator interfaces
|
||||
MetricCalculator.calculate_obi(snapshot) → float
|
||||
MetricCalculator.calculate_cvd(prev_cvd, trades) → float
|
||||
```
|
||||
- Visualizer tolerates JSON decode races by reusing last good values and logging warnings.
|
||||
- Processor guards depth parsing and writes; logs at debug/info levels.
|
||||
- Visualizer startup is wrapped; if it fails, processing continues without UI.
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Data Protection
|
||||
- **SQL Injection**: All queries use parameterized statements
|
||||
- **File Access**: Validates database file paths and permissions
|
||||
- **Error Handling**: No sensitive data in error messages
|
||||
- **Input Validation**: Sanitizes all external inputs
|
||||
- SQLite connections are read-only and immutable; no write queries executed.
|
||||
- File writes are confined to project directory; no paths derived from untrusted input.
|
||||
- Logs avoid sensitive data; only operational metadata.
|
||||
|
||||
### Access Control
|
||||
- **Database**: Respects file system permissions
|
||||
- **Memory**: No sensitive data persistence beyond processing
|
||||
- **Logging**: Configurable log levels without data exposure
|
||||
## Testing Guidance
|
||||
|
||||
## Configuration Management
|
||||
- Unit tests (run with `uv run pytest`):
|
||||
- `OHLCProcessor`: window boundary handling, high/low tracking, volume accumulation, upsert behavior.
|
||||
- Depth maintenance: deletions (size==0), top-N sorting, throttling.
|
||||
- `DBInterpreter.stream`: correct trade-window assignment, end-of-stream handling.
|
||||
- Integration: end-to-end generation of JSON from a tiny fixture DB and basic figure construction without launching a server.
|
||||
|
||||
### Performance Tuning
|
||||
```python
|
||||
# Storage configuration
|
||||
BATCH_SIZE = 1000 # Records per database operation
|
||||
LOG_FREQUENCY = 20 # Progress reports per processing run
|
||||
## Roadmap (Optional Enhancements)
|
||||
|
||||
# SQLite optimization
|
||||
PRAGMA journal_mode = OFF # Maximum write performance
|
||||
PRAGMA synchronous = OFF # Disable synchronous writes
|
||||
PRAGMA cache_size = 100000 # Large memory cache
|
||||
```
|
||||
|
||||
### Visualization Settings
|
||||
```python
|
||||
# Chart configuration
|
||||
WINDOW_SECONDS = 60 # OHLC aggregation window
|
||||
MAX_BARS = 500 # Maximum bars displayed
|
||||
FIGURE_SIZE = (12, 10) # Chart dimensions
|
||||
```
|
||||
|
||||
## Error Handling Strategy
|
||||
|
||||
### Graceful Degradation
|
||||
- **Database Errors**: Continue with reduced functionality
|
||||
- **Calculation Errors**: Skip problematic snapshots with logging
|
||||
- **Visualization Errors**: Display available data, note issues
|
||||
- **Memory Pressure**: Adjust batch sizes automatically
|
||||
|
||||
### Recovery Mechanisms
|
||||
- **Partial Processing**: Resume from last successful batch
|
||||
- **Data Validation**: Verify metrics calculations before storage
|
||||
- **Rollback Support**: Transaction boundaries for data consistency
|
||||
- Metrics: add OBI/CVD computation and persist metrics to a dedicated table.
|
||||
- Repository Pattern: extract DB access into a repository module with typed methods.
|
||||
- Orchestrator: introduce a `Storage` pipeline module coordinating batch processing and persistence.
|
||||
- Strategy Layer: compute signals/alerts on stored metrics.
|
||||
- Visualization: add OBI/CVD subplots and richer interactions.
|
||||
|
||||
---
|
||||
|
||||
This architecture provides a robust, scalable foundation for high-frequency trading data analysis while maintaining clean separation of concerns and efficient resource utilization.
|
||||
This document reflects the current implementation centered on SQLite streaming, JSON-based IPC, and a Dash visualizer, providing a clear foundation for incremental enhancements.
|
||||
|
||||
@@ -1,120 +0,0 @@
|
||||
# ADR-001: Persistent Metrics Storage
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The original orderflow backtest system kept all orderbook snapshots in memory during processing, leading to excessive memory usage (>1GB for typical datasets). With the addition of OBI and CVD metrics calculation, we needed to decide how to handle the computed metrics and manage memory efficiently.
|
||||
|
||||
## Decision
|
||||
We will implement persistent storage of calculated metrics in the SQLite database with the following approach:
|
||||
|
||||
1. **Metrics Table**: Create a dedicated `metrics` table to store OBI, CVD, and related data
|
||||
2. **Streaming Processing**: Process snapshots one-by-one, calculate metrics, store results, then discard snapshots
|
||||
3. **Batch Operations**: Use batch inserts (1000 records) for optimal database performance
|
||||
4. **Query Interface**: Provide time-range queries for metrics retrieval and analysis
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Memory Reduction**: >70% reduction in peak memory usage during processing
|
||||
- **Avoid Recalculation**: Metrics calculated once and reused for multiple analysis runs
|
||||
- **Scalability**: Can process months/years of data without memory constraints
|
||||
- **Performance**: Batch database operations provide high throughput
|
||||
- **Persistence**: Metrics survive between application runs
|
||||
- **Analysis Ready**: Stored metrics enable complex time-series analysis
|
||||
|
||||
### Negative
|
||||
- **Storage Overhead**: Metrics table adds ~20% to database size
|
||||
- **Complexity**: Additional database schema and management code
|
||||
- **Dependencies**: Tighter coupling between processing and database layer
|
||||
- **Migration**: Existing databases need schema updates for metrics table
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Option 1: Keep All Snapshots in Memory
|
||||
**Rejected**: Unsustainable memory usage for large datasets. Would limit analysis to small time ranges.
|
||||
|
||||
### Option 2: Calculate Metrics On-Demand
|
||||
**Rejected**: Recalculating metrics for every analysis run is computationally expensive and time-consuming.
|
||||
|
||||
### Option 3: External Metrics Database
|
||||
**Rejected**: Adds deployment complexity. SQLite co-location provides better performance and simpler management.
|
||||
|
||||
### Option 4: Compressed In-Memory Cache
|
||||
**Rejected**: Still faces fundamental memory scaling issues. Compression/decompression adds CPU overhead.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Database Schema
|
||||
```sql
|
||||
CREATE TABLE metrics (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
snapshot_id INTEGER NOT NULL,
|
||||
timestamp TEXT NOT NULL,
|
||||
obi REAL NOT NULL,
|
||||
cvd REAL NOT NULL,
|
||||
best_bid REAL,
|
||||
best_ask REAL,
|
||||
FOREIGN KEY (snapshot_id) REFERENCES book(id)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
|
||||
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
|
||||
```
|
||||
|
||||
### Processing Pipeline
|
||||
1. Create metrics table if not exists
|
||||
2. Stream through orderbook snapshots
|
||||
3. For each snapshot:
|
||||
- Calculate OBI and CVD metrics
|
||||
- Batch store metrics (1000 records per commit)
|
||||
- Discard snapshot from memory
|
||||
4. Provide query interface for time-range retrieval
|
||||
|
||||
### Memory Management
|
||||
- **Before**: Store all snapshots → Calculate on demand → High memory usage
|
||||
- **After**: Stream snapshots → Calculate immediately → Store metrics → Low memory usage
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Backward Compatibility
|
||||
- Existing databases continue to work without metrics table
|
||||
- System auto-creates metrics table on first processing run
|
||||
- Fallback to real-time calculation if metrics unavailable
|
||||
|
||||
### Performance Impact
|
||||
- **Processing Time**: Slight increase due to database writes (~10%)
|
||||
- **Query Performance**: Significant improvement for repeated analysis
|
||||
- **Overall**: Net positive performance for typical usage patterns
|
||||
|
||||
## Monitoring and Validation
|
||||
|
||||
### Success Metrics
|
||||
- **Memory Usage**: Target >70% reduction in peak memory usage
|
||||
- **Processing Speed**: Maintain >500 snapshots/second processing rate
|
||||
- **Storage Efficiency**: Metrics table <25% of total database size
|
||||
- **Query Performance**: <1 second retrieval for typical time ranges
|
||||
|
||||
### Validation Methods
|
||||
- Memory profiling during large dataset processing
|
||||
- Performance benchmarks vs. original system
|
||||
- Storage overhead analysis across different dataset sizes
|
||||
- Query performance testing with various time ranges
|
||||
|
||||
## Future Considerations
|
||||
|
||||
### Potential Enhancements
|
||||
- **Compression**: Consider compression for metrics storage if overhead becomes significant
|
||||
- **Partitioning**: Time-based partitioning for very large datasets
|
||||
- **Caching**: In-memory cache for frequently accessed metrics
|
||||
- **Export**: Direct export capabilities for external analysis tools
|
||||
|
||||
### Scalability Options
|
||||
- **Database Upgrade**: PostgreSQL if SQLite becomes limiting factor
|
||||
- **Parallel Processing**: Multi-threaded metrics calculation
|
||||
- **Distributed Storage**: For institutional-scale datasets
|
||||
|
||||
---
|
||||
|
||||
This decision provides a solid foundation for efficient, scalable metrics processing while maintaining simplicity and performance characteristics suitable for the target use cases.
|
||||
122
docs/decisions/ADR-001-sqlite-database-choice.md
Normal file
122
docs/decisions/ADR-001-sqlite-database-choice.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# ADR-001: SQLite Database Choice
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The orderflow backtest system needs to efficiently store and stream large volumes of historical orderbook and trade data. Key requirements include:
|
||||
|
||||
- Fast sequential read access for time-series data
|
||||
- Minimal setup and maintenance overhead
|
||||
- Support for concurrent reads from visualization layer
|
||||
- Ability to handle databases ranging from 100MB to 10GB+
|
||||
- No network dependencies for data access
|
||||
|
||||
## Decision
|
||||
We will use SQLite as the primary database for storing historical orderbook and trade data.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Zero configuration**: No database server setup or administration required
|
||||
- **Excellent read performance**: Optimized for sequential scans with proper PRAGMA settings
|
||||
- **Built-in Python support**: No external dependencies or connection libraries needed
|
||||
- **File portability**: Database files can be easily shared and archived
|
||||
- **ACID compliance**: Ensures data integrity during writes (for data ingestion)
|
||||
- **Small footprint**: Minimal memory and storage overhead
|
||||
- **Fast startup**: No connection pooling or server initialization delays
|
||||
|
||||
### Negative
|
||||
- **Single writer limitation**: Cannot handle concurrent writes (acceptable for read-only backtest)
|
||||
- **Limited scalability**: Not suitable for high-concurrency production trading systems
|
||||
- **No network access**: Cannot query databases remotely (acceptable for local analysis)
|
||||
- **File locking**: Potential issues with file system sharing (mitigated by read-only access)
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Schema Design
|
||||
```sql
|
||||
-- Orderbook snapshots with timestamp windows
|
||||
CREATE TABLE book (
|
||||
id INTEGER PRIMARY KEY,
|
||||
instrument TEXT,
|
||||
bids TEXT NOT NULL, -- JSON array of [price, size] pairs
|
||||
asks TEXT NOT NULL, -- JSON array of [price, size] pairs
|
||||
timestamp TEXT NOT NULL
|
||||
);
|
||||
|
||||
-- Individual trade records
|
||||
CREATE TABLE trades (
|
||||
id INTEGER PRIMARY KEY,
|
||||
instrument TEXT,
|
||||
trade_id TEXT,
|
||||
price REAL NOT NULL,
|
||||
size REAL NOT NULL,
|
||||
side TEXT NOT NULL, -- "buy" or "sell"
|
||||
timestamp TEXT NOT NULL
|
||||
);
|
||||
|
||||
-- Indexes for efficient time-based queries
|
||||
CREATE INDEX idx_book_timestamp ON book(timestamp);
|
||||
CREATE INDEX idx_trades_timestamp ON trades(timestamp);
|
||||
```
|
||||
|
||||
### Performance Optimizations
|
||||
```python
|
||||
# Read-only connection with optimized PRAGMA settings
|
||||
connection_uri = f"file:{db_path}?immutable=1&mode=ro"
|
||||
conn = sqlite3.connect(connection_uri, uri=True)
|
||||
conn.execute("PRAGMA query_only = 1")
|
||||
conn.execute("PRAGMA temp_store = MEMORY")
|
||||
conn.execute("PRAGMA mmap_size = 268435456") # 256MB
|
||||
conn.execute("PRAGMA cache_size = 10000")
|
||||
```
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### PostgreSQL
|
||||
- **Rejected**: Requires server setup and maintenance
|
||||
- **Pros**: Better concurrent access, richer query features
|
||||
- **Cons**: Overkill for read-only use case, deployment complexity
|
||||
|
||||
### Parquet Files
|
||||
- **Rejected**: Limited query capabilities for time-series data
|
||||
- **Pros**: Excellent compression, columnar format
|
||||
- **Cons**: No indexes, complex range queries, requires additional libraries
|
||||
|
||||
### MongoDB
|
||||
- **Rejected**: Document structure not optimal for time-series data
|
||||
- **Pros**: Flexible schema, good aggregation pipeline
|
||||
- **Cons**: Requires server, higher memory usage, learning curve
|
||||
|
||||
### CSV Files
|
||||
- **Rejected**: Poor query performance for large datasets
|
||||
- **Pros**: Simple format, universal compatibility
|
||||
- **Cons**: No indexing, slow filtering, type conversion overhead
|
||||
|
||||
### InfluxDB
|
||||
- **Rejected**: Overkill for historical data analysis
|
||||
- **Pros**: Optimized for time-series, good compression
|
||||
- **Cons**: Additional service dependency, learning curve
|
||||
|
||||
## Migration Path
|
||||
If scalability becomes an issue in the future:
|
||||
|
||||
1. **Phase 1**: Implement database abstraction layer in `db_interpreter`
|
||||
2. **Phase 2**: Add PostgreSQL adapter for production workloads
|
||||
3. **Phase 3**: Implement data partitioning for very large datasets
|
||||
4. **Phase 4**: Consider distributed storage for multi-terabyte datasets
|
||||
|
||||
## Monitoring
|
||||
Track the following metrics to validate this decision:
|
||||
- Database file sizes and growth rates
|
||||
- Query performance for different date ranges
|
||||
- Memory usage during streaming operations
|
||||
- Time to process complete backtests
|
||||
|
||||
## Review Date
|
||||
This decision should be reviewed if:
|
||||
- Database files consistently exceed 50GB
|
||||
- Query performance degrades below 1000 rows/second
|
||||
- Concurrent access requirements change
|
||||
- Network-based data sharing becomes necessary
|
||||
162
docs/decisions/ADR-002-json-ipc-communication.md
Normal file
162
docs/decisions/ADR-002-json-ipc-communication.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# ADR-002: JSON File-Based Inter-Process Communication
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The orderflow backtest system requires communication between the data processing pipeline and the web-based visualization frontend. Key requirements include:
|
||||
|
||||
- Real-time data updates from processor to visualization
|
||||
- Tolerance for timing mismatches between writer and reader
|
||||
- Simple implementation without external dependencies
|
||||
- Support for different update frequencies (OHLC bars vs. orderbook depth)
|
||||
- Graceful handling of process crashes or restarts
|
||||
|
||||
## Decision
|
||||
We will use JSON files with atomic write operations for inter-process communication between the data processor and Dash visualization frontend.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Simplicity**: No message queues, sockets, or complex protocols
|
||||
- **Fault tolerance**: File-based communication survives process restarts
|
||||
- **Debugging friendly**: Data files can be inspected manually
|
||||
- **No dependencies**: Built-in JSON support, no external libraries
|
||||
- **Atomic operations**: Temp file + rename prevents partial reads
|
||||
- **Language agnostic**: Any process can read/write JSON files
|
||||
- **Bounded memory**: Rolling data windows prevent unlimited growth
|
||||
|
||||
### Negative
|
||||
- **File I/O overhead**: Disk writes may be slower than in-memory communication
|
||||
- **Polling required**: Reader must poll for updates (500ms interval)
|
||||
- **Limited throughput**: Not suitable for high-frequency (microsecond) updates
|
||||
- **No acknowledgments**: Writer cannot confirm reader has processed data
|
||||
- **File system dependency**: Performance varies by storage type
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### File Structure
|
||||
```
|
||||
ohlc_data.json # Rolling array of OHLC bars (max 1000)
|
||||
depth_data.json # Current orderbook depth snapshot
|
||||
metrics_data.json # Rolling array of OBI/CVD metrics (max 1000)
|
||||
```
|
||||
|
||||
### Atomic Write Pattern
|
||||
```python
|
||||
def atomic_write(file_path: Path, data: Any) -> None:
|
||||
"""Write data atomically to prevent partial reads."""
|
||||
temp_path = file_path.with_suffix('.tmp')
|
||||
with open(temp_path, 'w') as f:
|
||||
json.dump(data, f)
|
||||
f.flush()
|
||||
os.fsync(f.fileno())
|
||||
temp_path.replace(file_path) # Atomic on POSIX systems
|
||||
```
|
||||
|
||||
### Data Formats
|
||||
```python
|
||||
# OHLC format: [timestamp_ms, open, high, low, close, volume]
|
||||
ohlc_data = [
|
||||
[1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
|
||||
[1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
|
||||
]
|
||||
|
||||
# Depth format: top-N levels per side
|
||||
depth_data = {
|
||||
"bids": [[49990.0, 1.5], [49985.0, 2.1]],
|
||||
"asks": [[50010.0, 1.2], [50015.0, 1.8]]
|
||||
}
|
||||
|
||||
# Metrics format: [timestamp_ms, obi_open, obi_high, obi_low, obi_close]
|
||||
metrics_data = [
|
||||
[1640995200000, 0.15, 0.22, 0.08, 0.18],
|
||||
[1640995260000, 0.18, 0.25, 0.12, 0.20]
|
||||
]
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
```python
|
||||
# Reader pattern with graceful fallback
|
||||
try:
|
||||
with open(data_file) as f:
|
||||
new_data = json.load(f)
|
||||
_LAST_DATA = new_data # Cache successful read
|
||||
except (FileNotFoundError, json.JSONDecodeError) as e:
|
||||
logging.warning(f"Using cached data: {e}")
|
||||
new_data = _LAST_DATA # Use cached data
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Write Performance
|
||||
- **Small files**: < 1MB typical, writes complete in < 10ms
|
||||
- **Atomic operations**: Add ~2-5ms overhead for temp file creation
|
||||
- **Throttling**: Updates limited to prevent excessive I/O
|
||||
|
||||
### Read Performance
|
||||
- **Parse time**: < 5ms for typical JSON file sizes
|
||||
- **Polling overhead**: 500ms interval balances responsiveness and CPU usage
|
||||
- **Error recovery**: Cached data eliminates visual glitches
|
||||
|
||||
### Memory Usage
|
||||
- **Bounded datasets**: Max 1000 bars × 6 fields × 8 bytes = ~48KB per file
|
||||
- **JSON overhead**: ~2x memory during parsing
|
||||
- **Total footprint**: < 500KB for all IPC data
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Redis Pub/Sub
|
||||
- **Rejected**: Additional service dependency, overkill for simple use case
|
||||
- **Pros**: True real-time updates, built-in data structures
|
||||
- **Cons**: External dependency, memory overhead, configuration complexity
|
||||
|
||||
### ZeroMQ
|
||||
- **Rejected**: Additional library dependency, more complex than needed
|
||||
- **Pros**: High performance, flexible patterns
|
||||
- **Cons**: Learning curve, binary dependency, networking complexity
|
||||
|
||||
### Named Pipes/Unix Sockets
|
||||
- **Rejected**: Platform-specific, more complex error handling
|
||||
- **Pros**: Better performance, no file I/O
|
||||
- **Cons**: Platform limitations, harder debugging, process lifetime coupling
|
||||
|
||||
### SQLite as Message Queue
|
||||
- **Rejected**: Overkill for simple data exchange
|
||||
- **Pros**: ACID transactions, complex queries possible
|
||||
- **Cons**: Schema management, locking considerations, overhead
|
||||
|
||||
### HTTP API
|
||||
- **Rejected**: Too much overhead for local communication
|
||||
- **Pros**: Standard protocol, language agnostic
|
||||
- **Cons**: Network stack overhead, port management, authentication
|
||||
|
||||
## Future Considerations
|
||||
|
||||
### Scalability Limits
|
||||
Current approach suitable for:
|
||||
- Update frequencies: 1-10 Hz
|
||||
- Data volumes: < 10MB total
|
||||
- Process counts: 1 writer, few readers
|
||||
|
||||
### Migration Path
|
||||
If performance becomes insufficient:
|
||||
1. **Phase 1**: Add compression (gzip) to reduce I/O
|
||||
2. **Phase 2**: Implement shared memory for high-frequency data
|
||||
3. **Phase 3**: Consider message queue for complex routing
|
||||
4. **Phase 4**: Migrate to streaming protocol for real-time requirements
|
||||
|
||||
## Monitoring
|
||||
Track these metrics to validate the approach:
|
||||
- File write latency and frequency
|
||||
- JSON parse times in visualization
|
||||
- Error rates for partial reads
|
||||
- Memory usage growth over time
|
||||
|
||||
## Review Triggers
|
||||
Reconsider this decision if:
|
||||
- Update frequency requirements exceed 10 Hz
|
||||
- File I/O becomes a performance bottleneck
|
||||
- Multiple visualization clients need the same data
|
||||
- Complex message routing becomes necessary
|
||||
- Platform portability becomes a concern
|
||||
@@ -1,217 +0,0 @@
|
||||
# ADR-002: Separation of Visualization from Strategy
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The original system embedded visualization functionality within the `DefaultStrategy` class, creating tight coupling between trading analysis logic and chart rendering. This design had several issues:
|
||||
|
||||
1. **Mixed Responsibilities**: Strategy classes handled both trading logic and GUI operations
|
||||
2. **Testing Complexity**: Strategy tests required mocking GUI components
|
||||
3. **Deployment Flexibility**: Strategies couldn't run in headless environments
|
||||
4. **Timing Control**: Visualization timing was tied to strategy execution rather than application flow
|
||||
|
||||
The user specifically requested to display visualizations after processing each database file, requiring better control over visualization timing.
|
||||
|
||||
## Decision
|
||||
We will separate visualization from strategy components with the following architecture:
|
||||
|
||||
1. **Remove Visualization from Strategy**: Strategy classes focus solely on trading analysis
|
||||
2. **Main Application Control**: `main.py` orchestrates visualization timing and updates
|
||||
3. **Independent Configuration**: Strategy and Visualizer get database paths independently
|
||||
4. **Clean Interfaces**: No direct dependencies between strategy and visualization components
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Single Responsibility**: Strategy focuses on trading logic, Visualizer on charts
|
||||
- **Better Testability**: Strategy tests run without GUI dependencies
|
||||
- **Flexible Deployment**: Strategies can run in headless/server environments
|
||||
- **Timing Control**: Visualization updates precisely when needed (after each DB)
|
||||
- **Maintainability**: Changes to visualization don't affect strategy logic
|
||||
- **Performance**: No GUI overhead during strategy analysis
|
||||
|
||||
### Negative
|
||||
- **Increased Complexity**: Main application handles more orchestration logic
|
||||
- **Coordination Required**: Must ensure strategy and visualizer get same database path
|
||||
- **Breaking Change**: Existing strategy initialization code needs updates
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Option 1: Keep Visualization in Strategy
|
||||
**Rejected**: Violates single responsibility principle. Makes testing difficult and deployment inflexible.
|
||||
|
||||
### Option 2: Observer Pattern
|
||||
**Rejected**: Adds unnecessary complexity for this use case. Direct control in main.py is simpler and more explicit.
|
||||
|
||||
### Option 3: Visualization Service
|
||||
**Rejected**: Over-engineering for current requirements. May be considered for future multi-strategy scenarios.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Before (Coupled Design)
|
||||
```python
|
||||
class DefaultStrategy:
|
||||
def __init__(self, instrument: str, enable_visualization: bool = True):
|
||||
self.visualizer = Visualizer(...) if enable_visualization else None
|
||||
|
||||
def on_booktick(self, book: Book):
|
||||
# Trading analysis
|
||||
# ...
|
||||
# Visualization update
|
||||
if self.visualizer:
|
||||
self.visualizer.update_from_book(book)
|
||||
```
|
||||
|
||||
### After (Separated Design)
|
||||
```python
|
||||
# Strategy focuses on analysis only
|
||||
class DefaultStrategy:
|
||||
def __init__(self, instrument: str):
|
||||
# No visualization dependencies
|
||||
|
||||
def on_booktick(self, book: Book):
|
||||
# Pure trading analysis
|
||||
# No visualization code
|
||||
|
||||
# Main application orchestrates both
|
||||
def main():
|
||||
strategy = DefaultStrategy(instrument)
|
||||
visualizer = Visualizer(...)
|
||||
|
||||
for db_path in db_paths:
|
||||
strategy.set_db_path(db_path)
|
||||
visualizer.set_db_path(db_path)
|
||||
|
||||
# Process data
|
||||
storage.build_booktick_from_db(db_path, db_date)
|
||||
|
||||
# Analysis
|
||||
strategy.on_booktick(storage.book)
|
||||
|
||||
# Visualization (controlled timing)
|
||||
visualizer.update_from_book(storage.book)
|
||||
|
||||
# Final display
|
||||
visualizer.show()
|
||||
```
|
||||
|
||||
### Interface Changes
|
||||
|
||||
#### Strategy Interface (Simplified)
|
||||
```python
|
||||
class DefaultStrategy:
|
||||
def __init__(self, instrument: str) # Removed visualization param
|
||||
def set_db_path(self, db_path: Path) -> None # No visualizer.set_db_path()
|
||||
def on_booktick(self, book: Book) -> None # No visualization calls
|
||||
```
|
||||
|
||||
#### Main Application (Enhanced)
|
||||
```python
|
||||
def main():
|
||||
# Separate initialization
|
||||
strategy = DefaultStrategy(instrument)
|
||||
visualizer = Visualizer(window_seconds=60, max_bars=500)
|
||||
|
||||
# Independent configuration
|
||||
for db_path in db_paths:
|
||||
strategy.set_db_path(db_path)
|
||||
visualizer.set_db_path(db_path)
|
||||
|
||||
# Controlled execution
|
||||
strategy.on_booktick(storage.book) # Analysis
|
||||
visualizer.update_from_book(storage.book) # Visualization
|
||||
```
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Code Changes Required
|
||||
1. **Strategy Classes**: Remove visualization initialization and calls
|
||||
2. **Main Application**: Add visualizer creation and orchestration
|
||||
3. **Tests**: Update strategy tests to remove visualization mocking
|
||||
4. **Configuration**: Remove visualization parameters from strategy constructors
|
||||
|
||||
### Backward Compatibility
|
||||
- **API Breaking**: Strategy constructor signature changes
|
||||
- **Functionality Preserved**: All visualization features remain available
|
||||
- **Test Updates**: Strategy tests become simpler (no GUI mocking needed)
|
||||
|
||||
### Migration Steps
|
||||
1. Update `DefaultStrategy` to remove visualization dependencies
|
||||
2. Modify `main.py` to create and manage `Visualizer` instance
|
||||
3. Update all strategy constructor calls to remove `enable_visualization`
|
||||
4. Update tests to reflect new interfaces
|
||||
5. Verify visualization timing meets requirements
|
||||
|
||||
## Benefits Achieved
|
||||
|
||||
### Clean Architecture
|
||||
- **Strategy**: Pure trading analysis logic
|
||||
- **Visualizer**: Pure chart rendering logic
|
||||
- **Main**: Application flow and component coordination
|
||||
|
||||
### Improved Testing
|
||||
```python
|
||||
# Before: Complex mocking required
|
||||
def test_strategy():
|
||||
with patch('visualizer.Visualizer') as mock_viz:
|
||||
strategy = DefaultStrategy("BTC", enable_visualization=True)
|
||||
# Complex mock setup...
|
||||
|
||||
# After: Simple, direct testing
|
||||
def test_strategy():
|
||||
strategy = DefaultStrategy("BTC")
|
||||
# Direct testing of analysis logic
|
||||
```
|
||||
|
||||
### Flexible Deployment
|
||||
```python
|
||||
# Headless server deployment
|
||||
strategy = DefaultStrategy("BTC")
|
||||
# No GUI dependencies, can run anywhere
|
||||
|
||||
# Development with visualization
|
||||
strategy = DefaultStrategy("BTC")
|
||||
visualizer = Visualizer(...)
|
||||
# Full GUI functionality when needed
|
||||
```
|
||||
|
||||
### Precise Timing Control
|
||||
```python
|
||||
# Visualization updates exactly when requested
|
||||
for db_file in database_files:
|
||||
process_database(db_file) # Data processing
|
||||
strategy.analyze(book) # Trading analysis
|
||||
visualizer.update_from_book(book) # Chart update after each DB
|
||||
```
|
||||
|
||||
## Monitoring and Validation
|
||||
|
||||
### Success Criteria
|
||||
- **Test Simplification**: Strategy tests run without GUI mocking
|
||||
- **Timing Accuracy**: Visualization updates after each database as requested
|
||||
- **Performance**: No GUI overhead during pure analysis operations
|
||||
- **Maintainability**: Visualization changes don't affect strategy code
|
||||
|
||||
### Validation Methods
|
||||
- Run strategy tests in headless environment
|
||||
- Verify visualization timing matches requirements
|
||||
- Performance comparison of analysis-only vs. GUI operations
|
||||
- Code complexity metrics for strategy vs. visualization modules
|
||||
|
||||
## Future Considerations
|
||||
|
||||
### Potential Enhancements
|
||||
- **Multiple Visualizers**: Support different chart types or windows
|
||||
- **Visualization Plugins**: Pluggable chart renderers for different outputs
|
||||
- **Remote Visualization**: Web-based charts for server deployments
|
||||
- **Batch Visualization**: Process multiple databases before chart updates
|
||||
|
||||
### Extensibility
|
||||
- **Strategy Plugins**: Easy to add strategies without visualization concerns
|
||||
- **Visualization Backends**: Swap chart libraries without affecting strategies
|
||||
- **Analysis Pipeline**: Clear separation enables complex analysis workflows
|
||||
|
||||
---
|
||||
|
||||
This separation provides a clean, maintainable architecture that supports the requested visualization timing while improving code quality and testability.
|
||||
204
docs/decisions/ADR-003-dash-visualization-framework.md
Normal file
204
docs/decisions/ADR-003-dash-visualization-framework.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# ADR-003: Dash Web Framework for Visualization
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
The orderflow backtest system requires a user interface for visualizing OHLC candlestick charts, volume data, orderbook depth, and derived metrics. Key requirements include:
|
||||
|
||||
- Real-time chart updates with minimal latency
|
||||
- Professional financial data visualization capabilities
|
||||
- Support for multiple chart types (candlesticks, bars, line charts)
|
||||
- Interactive features (zooming, panning, hover details)
|
||||
- Dark theme suitable for trading applications
|
||||
- Python-native solution to avoid JavaScript development
|
||||
|
||||
## Decision
|
||||
We will use Dash (by Plotly) as the web framework for building the visualization frontend, with Plotly.js for chart rendering.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Python-native**: No JavaScript development required
|
||||
- **Plotly integration**: Best-in-class financial charting capabilities
|
||||
- **Reactive architecture**: Automatic UI updates via callback system
|
||||
- **Professional appearance**: High-quality charts suitable for trading applications
|
||||
- **Interactive features**: Built-in zooming, panning, hover tooltips
|
||||
- **Responsive design**: Bootstrap integration for modern layouts
|
||||
- **Development speed**: Rapid prototyping and iteration
|
||||
- **WebGL acceleration**: Smooth performance for large datasets
|
||||
|
||||
### Negative
|
||||
- **Performance overhead**: Heavier than custom JavaScript solutions
|
||||
- **Limited customization**: Constrained by Dash component ecosystem
|
||||
- **Single-page limitation**: Not suitable for complex multi-page applications
|
||||
- **Memory usage**: Can be heavy for resource-constrained environments
|
||||
- **Learning curve**: Callback patterns require understanding of reactive programming
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Application Structure
|
||||
```python
|
||||
# Main application with Bootstrap theme
|
||||
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.FLATLY])
|
||||
|
||||
# Responsive layout with 9:3 ratio for charts:depth
|
||||
app.layout = dbc.Container([
|
||||
dbc.Row([
|
||||
dbc.Col([ # OHLC + Volume + Metrics
|
||||
dcc.Graph(id='ohlc-chart', style={'height': '100vh'})
|
||||
], width=9),
|
||||
dbc.Col([ # Orderbook Depth
|
||||
dcc.Graph(id='depth-chart', style={'height': '100vh'})
|
||||
], width=3)
|
||||
]),
|
||||
dcc.Interval(id='interval-update', interval=500, n_intervals=0)
|
||||
])
|
||||
```
|
||||
|
||||
### Chart Architecture
|
||||
```python
|
||||
# Multi-subplot chart with shared x-axis
|
||||
fig = make_subplots(
|
||||
rows=3, cols=1,
|
||||
row_heights=[0.6, 0.2, 0.2], # OHLC, Volume, Metrics
|
||||
vertical_spacing=0.02,
|
||||
shared_xaxes=True,
|
||||
subplot_titles=['Price', 'Volume', 'OBI Metrics']
|
||||
)
|
||||
|
||||
# Candlestick chart with dark theme
|
||||
fig.add_trace(go.Candlestick(
|
||||
x=timestamps, open=opens, high=highs, low=lows, close=closes,
|
||||
increasing_line_color='#00ff00', decreasing_line_color='#ff0000'
|
||||
), row=1, col=1)
|
||||
```
|
||||
|
||||
### Real-time Updates
|
||||
```python
|
||||
@app.callback(
|
||||
[Output('ohlc-chart', 'figure'), Output('depth-chart', 'figure')],
|
||||
[Input('interval-update', 'n_intervals')]
|
||||
)
|
||||
def update_charts(n_intervals):
|
||||
# Read data from JSON files with error handling
|
||||
# Build and return updated figures
|
||||
return ohlc_fig, depth_fig
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Update Latency
|
||||
- **Polling interval**: 500ms for near real-time updates
|
||||
- **Chart render time**: 50-200ms depending on data size
|
||||
- **Memory usage**: ~100MB for typical chart configurations
|
||||
- **Browser requirements**: Modern browser with WebGL support
|
||||
|
||||
### Scalability Limits
|
||||
- **Data points**: Up to 10,000 candlesticks without performance issues
|
||||
- **Update frequency**: Optimal at 1-2 Hz, maximum ~10 Hz
|
||||
- **Concurrent users**: Single user design (development server)
|
||||
- **Memory growth**: Linear with data history size
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Streamlit
|
||||
- **Rejected**: Less interactive, slower updates, limited charting
|
||||
- **Pros**: Simpler programming model, good for prototypes
|
||||
- **Cons**: Poor real-time performance, limited financial chart types
|
||||
|
||||
### Flask + Custom JavaScript
|
||||
- **Rejected**: Requires JavaScript development, more complex
|
||||
- **Pros**: Complete control, potentially better performance
|
||||
- **Cons**: Significant development overhead, maintenance burden
|
||||
|
||||
### Jupyter Notebooks
|
||||
- **Rejected**: Not suitable for production deployment
|
||||
- **Pros**: Great for exploration and analysis
|
||||
- **Cons**: No real-time updates, not web-deployable
|
||||
|
||||
### Bokeh
|
||||
- **Rejected**: Less mature ecosystem, fewer financial chart types
|
||||
- **Pros**: Good performance, Python-native
|
||||
- **Cons**: Smaller community, limited examples for financial data
|
||||
|
||||
### Custom React Application
|
||||
- **Rejected**: Requires separate frontend team, complex deployment
|
||||
- **Pros**: Maximum flexibility, best performance potential
|
||||
- **Cons**: High development cost, maintenance overhead
|
||||
|
||||
### Desktop GUI (Tkinter/PyQt)
|
||||
- **Rejected**: Not web-accessible, limited styling options
|
||||
- **Pros**: No browser dependency, good performance
|
||||
- **Cons**: Deployment complexity, poor mobile support
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Theme and Styling
|
||||
```python
|
||||
# Dark theme configuration
|
||||
dark_theme = {
|
||||
'plot_bgcolor': '#000000',
|
||||
'paper_bgcolor': '#000000',
|
||||
'font_color': '#ffffff',
|
||||
'grid_color': '#333333'
|
||||
}
|
||||
```
|
||||
|
||||
### Chart Types
|
||||
- **Candlestick charts**: OHLC price data with volume
|
||||
- **Bar charts**: Volume and metrics visualization
|
||||
- **Line charts**: Cumulative depth and trend analysis
|
||||
- **Scatter plots**: Trade-by-trade analysis (future)
|
||||
|
||||
### Interactive Features
|
||||
- **Zoom and pan**: Time-based navigation
|
||||
- **Hover tooltips**: Detailed data on mouse over
|
||||
- **Crosshairs**: Precise value reading
|
||||
- **Range selector**: Quick time period selection
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Short-term (1-3 months)
|
||||
- Add range selector for time navigation
|
||||
- Implement chart annotation for significant events
|
||||
- Add export functionality for charts and data
|
||||
|
||||
### Medium-term (3-6 months)
|
||||
- Multi-instrument support with tabs
|
||||
- Advanced indicators and overlays
|
||||
- User preference persistence
|
||||
|
||||
### Long-term (6+ months)
|
||||
- Real-time alerts and notifications
|
||||
- Strategy backtesting visualization
|
||||
- Portfolio-level analytics
|
||||
|
||||
## Monitoring and Metrics
|
||||
|
||||
### Performance Monitoring
|
||||
- Chart render times and update frequencies
|
||||
- Memory usage growth over time
|
||||
- Browser compatibility and error rates
|
||||
- User interaction patterns
|
||||
|
||||
### Quality Metrics
|
||||
- Chart accuracy compared to source data
|
||||
- Visual responsiveness during heavy updates
|
||||
- Error recovery from data corruption
|
||||
|
||||
## Review Triggers
|
||||
Reconsider this decision if:
|
||||
- Update frequency requirements exceed 10 Hz consistently
|
||||
- Memory usage becomes prohibitive (> 1GB)
|
||||
- Custom visualization requirements cannot be met
|
||||
- Multi-user deployment becomes necessary
|
||||
- Mobile responsiveness becomes a priority
|
||||
- Integration with external charting libraries is needed
|
||||
|
||||
## Migration Path
|
||||
If replacement becomes necessary:
|
||||
1. **Phase 1**: Abstract chart building logic from Dash specifics
|
||||
2. **Phase 2**: Implement alternative frontend while maintaining data formats
|
||||
3. **Phase 3**: A/B test performance and usability
|
||||
4. **Phase 4**: Complete migration with feature parity
|
||||
165
docs/modules/app.md
Normal file
165
docs/modules/app.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Module: app
|
||||
|
||||
## Purpose
|
||||
The `app` module provides a real-time Dash web application for visualizing OHLC candlestick charts, volume data, Order Book Imbalance (OBI) metrics, and orderbook depth. It implements a polling-based architecture that reads JSON data files and renders interactive charts with a dark theme.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Functions
|
||||
- `build_empty_ohlc_fig() -> go.Figure`: Create empty OHLC chart with proper styling
|
||||
- `build_empty_depth_fig() -> go.Figure`: Create empty depth chart with proper styling
|
||||
- `build_ohlc_fig(data: List[list], metrics: List[list]) -> go.Figure`: Build complete OHLC+Volume+OBI chart
|
||||
- `build_depth_fig(depth_data: dict) -> go.Figure`: Build orderbook depth visualization
|
||||
|
||||
### Global Variables
|
||||
- `_LAST_DATA`: Cached OHLC data for error recovery
|
||||
- `_LAST_DEPTH`: Cached depth data for error recovery
|
||||
- `_LAST_METRICS`: Cached metrics data for error recovery
|
||||
|
||||
### Dash Application
|
||||
- `app`: Main Dash application instance with Bootstrap theme
|
||||
- Layout with responsive grid (9:3 ratio for OHLC:Depth charts)
|
||||
- 500ms polling interval for real-time updates
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Running the Application
|
||||
```bash
|
||||
# Start the Dash server
|
||||
uv run python app.py
|
||||
|
||||
# Access the web interface
|
||||
# Open http://localhost:8050 in your browser
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
```python
|
||||
from app import build_ohlc_fig, build_depth_fig
|
||||
|
||||
# Build charts with sample data
|
||||
ohlc_data = [[1640995200000, 50000, 50100, 49900, 50050, 125.5]]
|
||||
metrics_data = [[1640995200000, 0.15, 0.22, 0.08, 0.18]]
|
||||
depth_data = {
|
||||
"bids": [[49990, 1.5], [49985, 2.1]],
|
||||
"asks": [[50010, 1.2], [50015, 1.8]]
|
||||
}
|
||||
|
||||
ohlc_fig = build_ohlc_fig(ohlc_data, metrics_data)
|
||||
depth_fig = build_depth_fig(depth_data)
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal
|
||||
- `viz_io`: Data file paths and JSON reading
|
||||
- `viz_io.DATA_FILE`: OHLC data source
|
||||
- `viz_io.DEPTH_FILE`: Depth data source
|
||||
- `viz_io.METRICS_FILE`: Metrics data source
|
||||
|
||||
### External
|
||||
- `dash`: Web application framework
|
||||
- `dash.html`, `dash.dcc`: HTML and core components
|
||||
- `dash_bootstrap_components`: Bootstrap styling
|
||||
- `plotly.graph_objs`: Chart objects
|
||||
- `plotly.subplots`: Multiple subplot support
|
||||
- `pandas`: Data manipulation (minimal usage)
|
||||
- `json`: JSON file parsing
|
||||
- `logging`: Error and debug logging
|
||||
- `pathlib`: File path handling
|
||||
|
||||
## Chart Architecture
|
||||
|
||||
### OHLC Chart (Left Panel, 9/12 width)
|
||||
- **Main subplot**: Candlestick chart with OHLC data
|
||||
- **Volume subplot**: Bar chart sharing x-axis with main chart
|
||||
- **OBI subplot**: Order Book Imbalance candlestick chart in blue tones
|
||||
- **Shared x-axis**: Synchronized zooming and panning across subplots
|
||||
|
||||
### Depth Chart (Right Panel, 3/12 width)
|
||||
- **Cumulative depth**: Stepped line chart showing bid/ask liquidity
|
||||
- **Color coding**: Green for bids, red for asks
|
||||
- **Real-time updates**: Reflects current orderbook state
|
||||
|
||||
## Styling and Theme
|
||||
|
||||
### Dark Theme Configuration
|
||||
- Background: Black (`#000000`)
|
||||
- Text: White (`#ffffff`)
|
||||
- Grid: Dark gray with transparency
|
||||
- Candlesticks: Green (up) / Red (down)
|
||||
- Volume: Gray bars
|
||||
- OBI: Blue tones for candlesticks
|
||||
- Depth: Green (bids) / Red (asks)
|
||||
|
||||
### Responsive Design
|
||||
- Bootstrap grid system for layout
|
||||
- Fluid container for full-width usage
|
||||
- 100vh height for full viewport coverage
|
||||
- Configurable chart display modes
|
||||
|
||||
## Data Polling and Error Handling
|
||||
|
||||
### Polling Strategy
|
||||
- **Interval**: 500ms for near real-time updates
|
||||
- **Graceful degradation**: Uses cached data on JSON read errors
|
||||
- **Atomic reads**: Tolerates partial writes during file updates
|
||||
- **Logging**: Warnings for data inconsistencies
|
||||
|
||||
### Error Recovery
|
||||
```python
|
||||
# Pseudocode for error handling pattern
|
||||
try:
|
||||
with open(data_file) as f:
|
||||
new_data = json.load(f)
|
||||
_LAST_DATA = new_data # Cache successful read
|
||||
except (FileNotFoundError, json.JSONDecodeError):
|
||||
logging.warning("Using cached data due to read error")
|
||||
new_data = _LAST_DATA # Use cached data
|
||||
```
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Client-side rendering**: Plotly.js handles chart rendering
|
||||
- **Efficient updates**: Only redraws when data changes
|
||||
- **Memory bounded**: Limited by max bars in data files (1000)
|
||||
- **Network efficient**: Local file polling (no external API calls)
|
||||
|
||||
## Testing
|
||||
|
||||
Run application tests:
|
||||
```bash
|
||||
uv run pytest test_app.py -v
|
||||
```
|
||||
|
||||
Test coverage includes:
|
||||
- Chart building functions
|
||||
- Data loading and caching
|
||||
- Error handling scenarios
|
||||
- Layout rendering
|
||||
- Callback functionality
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Server Configuration
|
||||
- **Host**: `0.0.0.0` (accessible from network)
|
||||
- **Port**: `8050` (default Dash port)
|
||||
- **Debug mode**: Disabled in production
|
||||
|
||||
### Chart Configuration
|
||||
- **Update interval**: 500ms (configurable via dcc.Interval)
|
||||
- **Display mode bar**: Enabled for user interaction
|
||||
- **Logo display**: Disabled for clean interface
|
||||
|
||||
## Known Issues
|
||||
|
||||
- High CPU usage during rapid data updates
|
||||
- Memory usage grows with chart history
|
||||
- No authentication or access control
|
||||
- Limited mobile responsiveness for complex charts
|
||||
|
||||
## Development Notes
|
||||
|
||||
- Uses Flask development server (not suitable for production)
|
||||
- Callback exceptions suppressed for partial data scenarios
|
||||
- Bootstrap CSS loaded from CDN
|
||||
- Chart configurations optimized for financial data visualization
|
||||
83
docs/modules/db_interpreter.md
Normal file
83
docs/modules/db_interpreter.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# Module: db_interpreter
|
||||
|
||||
## Purpose
|
||||
The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Classes
|
||||
- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook
|
||||
- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp
|
||||
|
||||
### Functions
|
||||
- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings
|
||||
|
||||
### Methods
|
||||
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows
|
||||
|
||||
## Usage Examples
|
||||
|
||||
```python
|
||||
from pathlib import Path
|
||||
from db_interpreter import DBInterpreter
|
||||
|
||||
# Initialize interpreter
|
||||
db_path = Path("data/BTC-USDT-2025-01-01.db")
|
||||
interpreter = DBInterpreter(db_path)
|
||||
|
||||
# Stream orderbook and trade data
|
||||
for ob_update, trades in interpreter.stream():
|
||||
# Process orderbook update
|
||||
print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")
|
||||
print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")
|
||||
|
||||
# Process trades in this window
|
||||
for trade in trades:
|
||||
trade_id, price, size, side, timestamp_ms = trade[1:6]
|
||||
print(f"Trade: {side} {size} @ {price}")
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal
|
||||
- None (standalone module)
|
||||
|
||||
### External
|
||||
- `sqlite3`: Database connectivity
|
||||
- `pathlib`: Path handling
|
||||
- `dataclasses`: Data structure definitions
|
||||
- `typing`: Type annotations
|
||||
- `logging`: Debug and error logging
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Batch sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage
|
||||
- **SQLite optimizations**: Read-only, immutable mode, large mmap and cache sizes
|
||||
- **Memory efficient**: Streaming iterator pattern prevents loading entire dataset
|
||||
- **Temporal windowing**: One-row lookahead for precise time boundary calculation
|
||||
|
||||
## Testing
|
||||
|
||||
Run module tests:
|
||||
```bash
|
||||
uv run pytest test_db_interpreter.py -v
|
||||
```
|
||||
|
||||
Test coverage includes:
|
||||
- Batch reading correctness
|
||||
- Temporal window boundary handling
|
||||
- Trade-to-window assignment accuracy
|
||||
- End-of-stream behavior
|
||||
- Error handling for malformed data
|
||||
|
||||
## Known Issues
|
||||
|
||||
- Requires specific database schema (book and trades tables)
|
||||
- Python-literal string parsing assumes well-formed input
|
||||
- Large databases may require memory monitoring during streaming
|
||||
|
||||
## Configuration
|
||||
|
||||
- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048)
|
||||
- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096)
|
||||
- SQLite PRAGMA settings optimized for read-only sequential access
|
||||
162
docs/modules/dependencies.md
Normal file
162
docs/modules/dependencies.md
Normal file
@@ -0,0 +1,162 @@
|
||||
# External Dependencies
|
||||
|
||||
## Overview
|
||||
This document describes all external dependencies used in the orderflow backtest system, their purposes, versions, and justifications for inclusion.
|
||||
|
||||
## Production Dependencies
|
||||
|
||||
### Core Framework Dependencies
|
||||
|
||||
#### Dash (^2.18.2)
|
||||
- **Purpose**: Web application framework for interactive visualizations
|
||||
- **Usage**: Real-time chart rendering and user interface
|
||||
- **Justification**: Mature Python-based framework with excellent Plotly integration
|
||||
- **Key Features**: Reactive components, built-in server, callback system
|
||||
|
||||
#### Dash Bootstrap Components (^1.6.0)
|
||||
- **Purpose**: Bootstrap CSS framework integration for Dash
|
||||
- **Usage**: Responsive layout grid and modern UI styling
|
||||
- **Justification**: Provides professional appearance with minimal custom CSS
|
||||
|
||||
#### Plotly (^5.24.1)
|
||||
- **Purpose**: Interactive charting and visualization library
|
||||
- **Usage**: OHLC candlesticks, volume bars, depth charts, OBI metrics
|
||||
- **Justification**: Industry standard for financial data visualization
|
||||
- **Key Features**: WebGL acceleration, zooming/panning, dark themes
|
||||
|
||||
### Data Processing Dependencies
|
||||
|
||||
#### Pandas (^2.2.3)
|
||||
- **Purpose**: Data manipulation and analysis library
|
||||
- **Usage**: Minimal usage for data structure conversions in visualization
|
||||
- **Justification**: Standard tool for financial data handling
|
||||
- **Note**: Usage kept minimal to maintain performance
|
||||
|
||||
#### Typer (^0.13.1)
|
||||
- **Purpose**: Modern CLI framework
|
||||
- **Usage**: Command-line argument parsing and help generation
|
||||
- **Justification**: Type-safe, auto-generated help, better UX than argparse
|
||||
- **Key Features**: Type hints integration, automatic validation
|
||||
|
||||
### Data Storage Dependencies
|
||||
|
||||
#### SQLite3 (Built-in)
|
||||
- **Purpose**: Database connectivity for historical data
|
||||
- **Usage**: Read-only access to orderbook and trade data
|
||||
- **Justification**: Built into Python, no external dependencies, excellent performance
|
||||
- **Configuration**: Optimized with immutable mode and mmap
|
||||
|
||||
## Development and Testing Dependencies
|
||||
|
||||
#### Pytest (^8.3.4)
|
||||
- **Purpose**: Testing framework
|
||||
- **Usage**: Unit tests, integration tests, test discovery
|
||||
- **Justification**: Standard Python testing tool with excellent plugin ecosystem
|
||||
|
||||
#### Coverage (^7.6.9)
|
||||
- **Purpose**: Code coverage measurement
|
||||
- **Usage**: Test coverage reporting and quality metrics
|
||||
- **Justification**: Essential for maintaining code quality
|
||||
|
||||
## Build and Package Management
|
||||
|
||||
#### UV (Package Manager)
|
||||
- **Purpose**: Fast Python package manager and task runner
|
||||
- **Usage**: Dependency management, virtual environments, script execution
|
||||
- **Justification**: Significantly faster than pip/poetry, better lock file format
|
||||
- **Commands**: `uv sync`, `uv run`, `uv add`
|
||||
|
||||
## Python Standard Library Usage
|
||||
|
||||
### Core Libraries
|
||||
- **sqlite3**: Database connectivity
|
||||
- **json**: JSON serialization for IPC
|
||||
- **pathlib**: Modern file path handling
|
||||
- **subprocess**: Process management for visualization
|
||||
- **logging**: Structured logging throughout application
|
||||
- **datetime**: Date/time parsing and manipulation
|
||||
- **dataclasses**: Structured data types
|
||||
- **typing**: Type annotations and hints
|
||||
- **tempfile**: Atomic file operations
|
||||
- **ast**: Safe evaluation of Python literals
|
||||
|
||||
### Performance Libraries
|
||||
- **itertools**: Efficient iteration patterns
|
||||
- **functools**: Function decoration and caching
|
||||
- **collections**: Specialized data structures
|
||||
|
||||
## Dependency Justifications
|
||||
|
||||
### Why Dash Over Alternatives?
|
||||
- **vs. Streamlit**: Better real-time updates, more control over layout
|
||||
- **vs. Flask + Custom JS**: Integrated Plotly support, faster development
|
||||
- **vs. Jupyter**: Better for production deployment, process isolation
|
||||
|
||||
### Why SQLite Over Alternatives?
|
||||
- **vs. PostgreSQL**: No server setup required, excellent read performance
|
||||
- **vs. Parquet**: Better for time-series queries, built-in indexing
|
||||
- **vs. CSV**: Proper data types, much faster queries, atomic transactions
|
||||
|
||||
### Why UV Over Poetry/Pip?
|
||||
- **vs. Poetry**: Significantly faster dependency resolution and installation
|
||||
- **vs. Pip**: Better dependency locking, integrated task runner
|
||||
- **vs. Pipenv**: More active development, better performance
|
||||
|
||||
## Version Pinning Strategy
|
||||
|
||||
### Patch Version Pinning
|
||||
- Core dependencies (Dash, Plotly) pinned to patch versions
|
||||
- Prevents breaking changes while allowing security updates
|
||||
|
||||
### Range Pinning
|
||||
- Development tools use caret (^) ranges for flexibility
|
||||
- Testing tools can update more freely
|
||||
|
||||
### Lock File Management
|
||||
- `uv.lock` ensures reproducible builds across environments
|
||||
- Regular updates scheduled monthly for security patches
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Dependency Scanning
|
||||
- Regular audit of dependencies for known vulnerabilities
|
||||
- Automated updates for security patches
|
||||
- Minimal dependency tree to reduce attack surface
|
||||
|
||||
### Data Isolation
|
||||
- Read-only database access prevents data modification
|
||||
- No external network connections required for core functionality
|
||||
- All file operations contained within project directory
|
||||
|
||||
## Performance Impact
|
||||
|
||||
### Bundle Size
|
||||
- Core runtime: ~50MB with all dependencies
|
||||
- Dash frontend: Additional ~10MB for JavaScript assets
|
||||
- SQLite: Zero overhead (built-in)
|
||||
|
||||
### Startup Time
|
||||
- Cold start: ~2-3 seconds for full application
|
||||
- UV virtual environment activation: ~100ms
|
||||
- Database connection: ~50ms per file
|
||||
|
||||
### Memory Usage
|
||||
- Base application: ~100MB
|
||||
- Per 1000 OHLC bars: ~5MB additional
|
||||
- Plotly charts: ~20MB for complex visualizations
|
||||
|
||||
## Maintenance Schedule
|
||||
|
||||
### Monthly
|
||||
- Security update review and application
|
||||
- Dependency version bump evaluation
|
||||
|
||||
### Quarterly
|
||||
- Major version update consideration
|
||||
- Performance impact assessment
|
||||
- Alternative technology evaluation
|
||||
|
||||
### Annually
|
||||
- Complete dependency audit
|
||||
- Technology stack review
|
||||
- Migration planning for deprecated packages
|
||||
101
docs/modules/level_parser.md
Normal file
101
docs/modules/level_parser.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# Module: level_parser
|
||||
|
||||
## Purpose
|
||||
The `level_parser` module provides utilities for parsing and normalizing orderbook level data from various string formats. It handles JSON and Python literal representations, converting them into standardized numeric tuples for processing.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Functions
|
||||
- `normalize_levels(levels: Any) -> List[List[float]]`: Parse levels into [[price, size], ...] format, filtering out zero/negative sizes
|
||||
- `parse_levels_including_zeros(levels: Any) -> List[Tuple[float, float]]`: Parse levels preserving zero sizes for deletion operations
|
||||
|
||||
### Private Functions
|
||||
- `_parse_string_to_list(levels: Any) -> List[Any]`: Core parsing logic trying JSON first, then literal_eval
|
||||
- `_extract_price_size(item: Any) -> Tuple[Any, Any]`: Extract price/size from dict or list/tuple formats
|
||||
|
||||
## Usage Examples
|
||||
|
||||
```python
|
||||
from level_parser import normalize_levels, parse_levels_including_zeros
|
||||
|
||||
# Parse standard levels (filters zeros)
|
||||
levels = normalize_levels('[[50000.0, 1.5], [49999.0, 2.0]]')
|
||||
# Returns: [[50000.0, 1.5], [49999.0, 2.0]]
|
||||
|
||||
# Parse with zero sizes preserved (for deletions)
|
||||
updates = parse_levels_including_zeros('[[50000.0, 0.0], [49999.0, 1.5]]')
|
||||
# Returns: [(50000.0, 0.0), (49999.0, 1.5)]
|
||||
|
||||
# Supports dict format
|
||||
dict_levels = normalize_levels('[{"price": 50000.0, "size": 1.5}]')
|
||||
# Returns: [[50000.0, 1.5]]
|
||||
|
||||
# Short key format
|
||||
short_levels = normalize_levels('[{"p": 50000.0, "s": 1.5}]')
|
||||
# Returns: [[50000.0, 1.5]]
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### External
|
||||
- `json`: Primary parsing method for level data
|
||||
- `ast.literal_eval`: Fallback parsing for Python literal formats
|
||||
- `logging`: Debug logging for parsing issues
|
||||
- `typing`: Type annotations
|
||||
|
||||
## Input Formats Supported
|
||||
|
||||
### JSON Array Format
|
||||
```json
|
||||
[[50000.0, 1.5], [49999.0, 2.0]]
|
||||
```
|
||||
|
||||
### Dict Format (Full Keys)
|
||||
```json
|
||||
[{"price": 50000.0, "size": 1.5}, {"price": 49999.0, "size": 2.0}]
|
||||
```
|
||||
|
||||
### Dict Format (Short Keys)
|
||||
```json
|
||||
[{"p": 50000.0, "s": 1.5}, {"p": 49999.0, "s": 2.0}]
|
||||
```
|
||||
|
||||
### Python Literal Format
|
||||
```python
|
||||
"[(50000.0, 1.5), (49999.0, 2.0)]"
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
- **Graceful Degradation**: Returns empty list on parse failures
|
||||
- **Data Validation**: Filters out invalid price/size pairs
|
||||
- **Type Safety**: Converts all values to float before processing
|
||||
- **Debug Logging**: Logs warnings for malformed input without crashing
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Fast Path**: JSON parsing prioritized for performance
|
||||
- **Fallback Support**: ast.literal_eval as backup for edge cases
|
||||
- **Memory Efficient**: Processes items iteratively, not loading entire dataset
|
||||
- **Validation**: Minimal overhead with early filtering of invalid data
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
uv run pytest test_level_parser.py -v
|
||||
```
|
||||
|
||||
Test coverage includes:
|
||||
- JSON format parsing accuracy
|
||||
- Dict format (both key styles) parsing
|
||||
- Python literal fallback parsing
|
||||
- Zero size preservation vs filtering
|
||||
- Error handling for malformed input
|
||||
- Type conversion edge cases
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- Assumes well-formed numeric data (price/size as numbers)
|
||||
- Does not validate economic constraints (e.g., positive prices)
|
||||
- Limited to list/dict input formats
|
||||
- No support for streaming/incremental parsing
|
||||
168
docs/modules/main.md
Normal file
168
docs/modules/main.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# Module: main
|
||||
|
||||
## Purpose
|
||||
The `main` module provides the command-line interface (CLI) orchestration for the orderflow backtest system. It handles database discovery, process management, and coordinates the streaming pipeline with the visualization frontend using Typer for argument parsing.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Functions
|
||||
- `main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None`: Primary CLI entrypoint
|
||||
- `discover_databases(instrument: str, start_date: str, end_date: str) -> list[Path]`: Find matching database files
|
||||
- `launch_visualizer() -> subprocess.Popen | None`: Start Dash application in separate process
|
||||
|
||||
### CLI Arguments
|
||||
- `instrument`: Trading pair identifier (e.g., "BTC-USDT")
|
||||
- `start_date`: Start date in YYYY-MM-DD format (UTC)
|
||||
- `end_date`: End date in YYYY-MM-DD format (UTC)
|
||||
- `--window-seconds`: OHLC aggregation window size (default: 60)
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Command Line Usage
|
||||
```bash
|
||||
# Basic usage with default 60-second windows
|
||||
uv run python main.py BTC-USDT 2025-01-01 2025-01-31
|
||||
|
||||
# Custom window size
|
||||
uv run python main.py ETH-USDT 2025-02-01 2025-02-28 --window-seconds 30
|
||||
|
||||
# Single day processing
|
||||
uv run python main.py SOL-USDT 2025-03-15 2025-03-15
|
||||
```
|
||||
|
||||
### Programmatic Usage
|
||||
```python
|
||||
from main import main, discover_databases
|
||||
|
||||
# Run processing pipeline
|
||||
main("BTC-USDT", "2025-01-01", "2025-01-31", window_seconds=120)
|
||||
|
||||
# Discover available databases
|
||||
db_files = discover_databases("ETH-USDT", "2025-02-01", "2025-02-28")
|
||||
print(f"Found {len(db_files)} database files")
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal
|
||||
- `db_interpreter.DBInterpreter`: Database streaming
|
||||
- `ohlc_processor.OHLCProcessor`: Trade aggregation and orderbook processing
|
||||
- `viz_io`: Data clearing functions
|
||||
|
||||
### External
|
||||
- `typer`: CLI framework and argument parsing
|
||||
- `subprocess`: Process management for visualization
|
||||
- `pathlib`: File and directory operations
|
||||
- `datetime`: Date parsing and validation
|
||||
- `logging`: Operational logging
|
||||
- `sys`: Exit code management
|
||||
|
||||
## Database Discovery Logic
|
||||
|
||||
### File Pattern Matching
|
||||
```python
|
||||
# Expected directory structure
|
||||
../data/OKX/{instrument}/{date}/
|
||||
|
||||
# Example paths
|
||||
../data/OKX/BTC-USDT/2025-01-01/trades.db
|
||||
../data/OKX/ETH-USDT/2025-02-15/trades.db
|
||||
```
|
||||
|
||||
### Discovery Algorithm
|
||||
1. Parse start and end dates to datetime objects
|
||||
2. Iterate through date range (inclusive)
|
||||
3. Construct expected path for each date
|
||||
4. Verify file existence and readability
|
||||
5. Return sorted list of valid database paths
|
||||
|
||||
## Process Orchestration
|
||||
|
||||
### Visualization Process Management
|
||||
```python
|
||||
# Launch Dash app in separate process
|
||||
viz_process = subprocess.Popen([
|
||||
"uv", "run", "python", "app.py"
|
||||
], cwd=project_root)
|
||||
|
||||
# Process management
|
||||
try:
|
||||
# Main processing loop
|
||||
process_databases(db_files)
|
||||
finally:
|
||||
# Cleanup visualization process
|
||||
if viz_process:
|
||||
viz_process.terminate()
|
||||
viz_process.wait(timeout=5)
|
||||
```
|
||||
|
||||
### Data Processing Pipeline
|
||||
1. **Initialize**: Clear existing data files
|
||||
2. **Launch**: Start visualization process
|
||||
3. **Stream**: Process each database sequentially
|
||||
4. **Aggregate**: Generate OHLC bars and depth snapshots
|
||||
5. **Cleanup**: Terminate visualization and finalize
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Database Access Errors
|
||||
- **File not found**: Log warning and skip missing databases
|
||||
- **Permission denied**: Log error and exit with status code 1
|
||||
- **Corruption**: Log error for specific database and continue with next
|
||||
|
||||
### Process Management Errors
|
||||
- **Visualization startup failure**: Log error but continue processing
|
||||
- **Process termination**: Graceful shutdown with timeout
|
||||
- **Resource cleanup**: Ensure child processes are terminated
|
||||
|
||||
### Date Validation
|
||||
- **Invalid format**: Clear error message with expected format
|
||||
- **Invalid range**: End date must be >= start date
|
||||
- **Future dates**: Warning for dates beyond data availability
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Sequential processing**: Databases processed one at a time
|
||||
- **Memory efficient**: Streaming approach prevents loading entire datasets
|
||||
- **Process isolation**: Visualization runs independently
|
||||
- **Resource cleanup**: Automatic process termination on exit
|
||||
|
||||
## Testing
|
||||
|
||||
Run module tests:
|
||||
```bash
|
||||
uv run pytest test_main.py -v
|
||||
```
|
||||
|
||||
Test coverage includes:
|
||||
- Database discovery logic
|
||||
- Date parsing and validation
|
||||
- Process management
|
||||
- Error handling scenarios
|
||||
- CLI argument validation
|
||||
|
||||
## Configuration
|
||||
|
||||
### Default Settings
|
||||
- **Data directory**: `../data/OKX` (relative to project root)
|
||||
- **Visualization command**: `uv run python app.py`
|
||||
- **Window size**: 60 seconds
|
||||
- **Process timeout**: 5 seconds for termination
|
||||
|
||||
### Environment Variables
|
||||
- **DATA_PATH**: Override default data directory
|
||||
- **VISUALIZATION_PORT**: Override Dash port (requires app.py modification)
|
||||
|
||||
## Known Issues
|
||||
|
||||
- Assumes specific directory structure under `../data/OKX`
|
||||
- No validation of database schema compatibility
|
||||
- Limited error recovery for process management
|
||||
- No progress indication for large datasets
|
||||
|
||||
## Development Notes
|
||||
|
||||
- Uses Typer for modern CLI interface
|
||||
- Subprocess management compatible with Unix/Windows
|
||||
- Logging configured for both development and production use
|
||||
- Exit codes follow Unix conventions (0=success, 1=error)
|
||||
@@ -1,302 +0,0 @@
|
||||
# Module: Metrics Calculation System
|
||||
|
||||
## Purpose
|
||||
|
||||
The metrics calculation system provides high-performance computation of Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD) indicators for cryptocurrency trading analysis. It processes orderbook snapshots and trade data to generate financial metrics with per-snapshot granularity.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Classes
|
||||
|
||||
#### `Metric` (dataclass)
|
||||
Represents calculated metrics for a single orderbook snapshot.
|
||||
|
||||
```python
|
||||
@dataclass(slots=True)
|
||||
class Metric:
|
||||
snapshot_id: int # Reference to source snapshot
|
||||
timestamp: int # Unix timestamp
|
||||
obi: float # Order Book Imbalance [-1, 1]
|
||||
cvd: float # Cumulative Volume Delta
|
||||
best_bid: float | None # Best bid price
|
||||
best_ask: float | None # Best ask price
|
||||
```
|
||||
|
||||
#### `MetricCalculator` (static class)
|
||||
Provides calculation methods for financial metrics.
|
||||
|
||||
```python
|
||||
class MetricCalculator:
|
||||
@staticmethod
|
||||
def calculate_obi(snapshot: BookSnapshot) -> float
|
||||
|
||||
@staticmethod
|
||||
def calculate_volume_delta(trades: List[Trade]) -> float
|
||||
|
||||
@staticmethod
|
||||
def calculate_cvd(previous_cvd: float, volume_delta: float) -> float
|
||||
|
||||
@staticmethod
|
||||
def get_best_bid_ask(snapshot: BookSnapshot) -> tuple[float | None, float | None]
|
||||
```
|
||||
|
||||
### Functions
|
||||
|
||||
#### Order Book Imbalance (OBI) Calculation
|
||||
```python
|
||||
def calculate_obi(snapshot: BookSnapshot) -> float:
|
||||
"""
|
||||
Calculate Order Book Imbalance using the standard formula.
|
||||
|
||||
Formula: OBI = (Vb - Va) / (Vb + Va)
|
||||
Where:
|
||||
Vb = Total volume on bid side
|
||||
Va = Total volume on ask side
|
||||
|
||||
Args:
|
||||
snapshot: BookSnapshot containing bids and asks data
|
||||
|
||||
Returns:
|
||||
float: OBI value between -1 and 1, or 0.0 if no volume
|
||||
|
||||
Example:
|
||||
>>> snapshot = BookSnapshot(bids={50000.0: OrderbookLevel(...)}, ...)
|
||||
>>> obi = MetricCalculator.calculate_obi(snapshot)
|
||||
>>> print(f"OBI: {obi:.3f}")
|
||||
OBI: 0.333
|
||||
"""
|
||||
```
|
||||
|
||||
#### Volume Delta Calculation
|
||||
```python
|
||||
def calculate_volume_delta(trades: List[Trade]) -> float:
|
||||
"""
|
||||
Calculate Volume Delta for a list of trades.
|
||||
|
||||
Volume Delta = Buy Volume - Sell Volume
|
||||
- Buy trades (side = "buy"): positive contribution
|
||||
- Sell trades (side = "sell"): negative contribution
|
||||
|
||||
Args:
|
||||
trades: List of Trade objects for specific timestamp
|
||||
|
||||
Returns:
|
||||
float: Net volume delta (positive = buy pressure, negative = sell pressure)
|
||||
|
||||
Example:
|
||||
>>> trades = [
|
||||
... Trade(side="buy", size=10.0, ...),
|
||||
... Trade(side="sell", size=3.0, ...)
|
||||
... ]
|
||||
>>> vd = MetricCalculator.calculate_volume_delta(trades)
|
||||
>>> print(f"Volume Delta: {vd}")
|
||||
Volume Delta: 7.0
|
||||
"""
|
||||
```
|
||||
|
||||
#### Cumulative Volume Delta (CVD) Calculation
|
||||
```python
|
||||
def calculate_cvd(previous_cvd: float, volume_delta: float) -> float:
|
||||
"""
|
||||
Calculate Cumulative Volume Delta with incremental support.
|
||||
|
||||
Formula: CVD_t = CVD_{t-1} + Volume_Delta_t
|
||||
|
||||
Args:
|
||||
previous_cvd: Previous CVD value (use 0.0 for reset)
|
||||
volume_delta: Current volume delta to add
|
||||
|
||||
Returns:
|
||||
float: New cumulative volume delta value
|
||||
|
||||
Example:
|
||||
>>> cvd = 0.0 # Starting value
|
||||
>>> cvd = MetricCalculator.calculate_cvd(cvd, 10.0) # First trade
|
||||
>>> cvd = MetricCalculator.calculate_cvd(cvd, -5.0) # Second trade
|
||||
>>> print(f"CVD: {cvd}")
|
||||
CVD: 5.0
|
||||
"""
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic OBI Calculation
|
||||
```python
|
||||
from models import MetricCalculator, BookSnapshot, OrderbookLevel
|
||||
|
||||
# Create sample orderbook snapshot
|
||||
snapshot = BookSnapshot(
|
||||
id=1,
|
||||
timestamp=1640995200,
|
||||
bids={
|
||||
50000.0: OrderbookLevel(price=50000.0, size=10.0, liquidation_count=0, order_count=1),
|
||||
49999.0: OrderbookLevel(price=49999.0, size=5.0, liquidation_count=0, order_count=1),
|
||||
},
|
||||
asks={
|
||||
50001.0: OrderbookLevel(price=50001.0, size=3.0, liquidation_count=0, order_count=1),
|
||||
50002.0: OrderbookLevel(price=50002.0, size=2.0, liquidation_count=0, order_count=1),
|
||||
}
|
||||
)
|
||||
|
||||
# Calculate OBI
|
||||
obi = MetricCalculator.calculate_obi(snapshot)
|
||||
print(f"OBI: {obi:.3f}") # Output: OBI: 0.500
|
||||
# Explanation: (15 - 5) / (15 + 5) = 10/20 = 0.5
|
||||
```
|
||||
|
||||
### CVD Calculation with Reset
|
||||
```python
|
||||
from models import MetricCalculator, Trade
|
||||
|
||||
# Simulate trading session
|
||||
cvd = 0.0 # Reset CVD at session start
|
||||
|
||||
# Process trades for first timestamp
|
||||
trades_t1 = [
|
||||
Trade(id=1, trade_id=1.0, price=50000.0, size=8.0, side="buy", timestamp=1000),
|
||||
Trade(id=2, trade_id=2.0, price=50001.0, size=3.0, side="sell", timestamp=1000),
|
||||
]
|
||||
|
||||
vd_t1 = MetricCalculator.calculate_volume_delta(trades_t1) # 8.0 - 3.0 = 5.0
|
||||
cvd = MetricCalculator.calculate_cvd(cvd, vd_t1) # 0.0 + 5.0 = 5.0
|
||||
|
||||
# Process trades for second timestamp
|
||||
trades_t2 = [
|
||||
Trade(id=3, trade_id=3.0, price=49999.0, size=2.0, side="buy", timestamp=1001),
|
||||
Trade(id=4, trade_id=4.0, price=50000.0, size=7.0, side="sell", timestamp=1001),
|
||||
]
|
||||
|
||||
vd_t2 = MetricCalculator.calculate_volume_delta(trades_t2) # 2.0 - 7.0 = -5.0
|
||||
cvd = MetricCalculator.calculate_cvd(cvd, vd_t2) # 5.0 + (-5.0) = 0.0
|
||||
|
||||
print(f"Final CVD: {cvd}") # Output: Final CVD: 0.0
|
||||
```
|
||||
|
||||
### Complete Metrics Processing
|
||||
```python
|
||||
from models import MetricCalculator, Metric
|
||||
|
||||
def process_snapshot_metrics(snapshot, trades, previous_cvd=0.0):
|
||||
"""Process complete metrics for a single snapshot."""
|
||||
|
||||
# Calculate OBI
|
||||
obi = MetricCalculator.calculate_obi(snapshot)
|
||||
|
||||
# Calculate volume delta and CVD
|
||||
volume_delta = MetricCalculator.calculate_volume_delta(trades)
|
||||
cvd = MetricCalculator.calculate_cvd(previous_cvd, volume_delta)
|
||||
|
||||
# Extract best bid/ask
|
||||
best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
|
||||
|
||||
# Create metric record
|
||||
metric = Metric(
|
||||
snapshot_id=snapshot.id,
|
||||
timestamp=snapshot.timestamp,
|
||||
obi=obi,
|
||||
cvd=cvd,
|
||||
best_bid=best_bid,
|
||||
best_ask=best_ask
|
||||
)
|
||||
|
||||
return metric, cvd
|
||||
|
||||
# Usage in processing loop
|
||||
current_cvd = 0.0
|
||||
for snapshot, trades in snapshot_trade_pairs:
|
||||
metric, current_cvd = process_snapshot_metrics(snapshot, trades, current_cvd)
|
||||
# Store metric to database...
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal
|
||||
- `models.BookSnapshot`: Orderbook state data
|
||||
- `models.Trade`: Individual trade execution data
|
||||
- `models.OrderbookLevel`: Price level information
|
||||
|
||||
### External
|
||||
- **Python Standard Library**: `typing` for type hints
|
||||
- **No external packages required**
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
### Computational Complexity
|
||||
- **OBI Calculation**: O(n) where n = number of price levels
|
||||
- **Volume Delta**: O(m) where m = number of trades
|
||||
- **CVD Calculation**: O(1) - simple addition
|
||||
- **Best Bid/Ask**: O(n) for min/max operations
|
||||
|
||||
### Memory Usage
|
||||
- **Static Methods**: No instance state, minimal memory overhead
|
||||
- **Calculations**: Process data in-place without copying
|
||||
- **Results**: Lightweight `Metric` objects with slots optimization
|
||||
|
||||
### Typical Performance
|
||||
```python
|
||||
# Benchmark results (approximate)
|
||||
Snapshot with 50 price levels: ~0.1ms per OBI calculation
|
||||
Timestamp with 20 trades: ~0.05ms per volume delta
|
||||
CVD update: ~0.001ms per calculation
|
||||
Complete metric processing: ~0.2ms per snapshot
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Edge Cases Handled
|
||||
```python
|
||||
# Empty orderbook
|
||||
empty_snapshot = BookSnapshot(bids={}, asks={})
|
||||
obi = MetricCalculator.calculate_obi(empty_snapshot) # Returns 0.0
|
||||
|
||||
# No trades
|
||||
empty_trades = []
|
||||
vd = MetricCalculator.calculate_volume_delta(empty_trades) # Returns 0.0
|
||||
|
||||
# Zero volume scenario
|
||||
zero_vol_snapshot = BookSnapshot(
|
||||
bids={50000.0: OrderbookLevel(price=50000.0, size=0.0, ...)},
|
||||
asks={50001.0: OrderbookLevel(price=50001.0, size=0.0, ...)}
|
||||
)
|
||||
obi = MetricCalculator.calculate_obi(zero_vol_snapshot) # Returns 0.0
|
||||
```
|
||||
|
||||
### Validation
|
||||
- **OBI Range**: Results automatically bounded to [-1, 1]
|
||||
- **Division by Zero**: Handled gracefully with 0.0 return
|
||||
- **Invalid Data**: Empty collections handled without errors
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Coverage
|
||||
- **Unit Tests**: `tests/test_metric_calculator.py`
|
||||
- **Integration Tests**: Included in storage and strategy tests
|
||||
- **Edge Cases**: Empty data, zero volume, boundary conditions
|
||||
|
||||
### Running Tests
|
||||
```bash
|
||||
# Run metric calculator tests specifically
|
||||
uv run pytest tests/test_metric_calculator.py -v
|
||||
|
||||
# Run all tests with metrics
|
||||
uv run pytest -k "metric" -v
|
||||
|
||||
# Performance tests
|
||||
uv run pytest tests/test_metric_calculator.py::test_calculate_obi_performance
|
||||
```
|
||||
|
||||
## Known Issues
|
||||
|
||||
### Current Limitations
|
||||
- **Precision**: Floating-point arithmetic limitations for very small numbers
|
||||
- **Scale**: No optimization for extremely large orderbooks (>10k levels)
|
||||
- **Currency**: No multi-currency support (assumes single denomination)
|
||||
|
||||
### Planned Enhancements
|
||||
- **Decimal Precision**: Consider `decimal.Decimal` for high-precision calculations
|
||||
- **Vectorization**: NumPy integration for batch calculations
|
||||
- **Additional Metrics**: Volume Profile, Liquidity metrics, Delta Flow
|
||||
|
||||
---
|
||||
|
||||
The metrics calculation system provides a robust foundation for financial analysis with clean interfaces, comprehensive error handling, and optimal performance for high-frequency trading data.
|
||||
147
docs/modules/metrics_calculator.md
Normal file
147
docs/modules/metrics_calculator.md
Normal file
@@ -0,0 +1,147 @@
|
||||
# Module: metrics_calculator
|
||||
|
||||
## Purpose
|
||||
The `metrics_calculator` module handles calculation and management of trading metrics including Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD). It provides windowed aggregation with throttled updates for real-time visualization.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Classes
|
||||
- `MetricsCalculator(window_seconds: int = 60, emit_every_n_updates: int = 25)`: Main metrics calculation engine
|
||||
|
||||
### Methods
|
||||
- `update_cvd_from_trade(side: str, size: float) -> None`: Update CVD from individual trade data
|
||||
- `update_obi_metrics(timestamp: str, total_bids: float, total_asks: float) -> None`: Update OBI metrics from orderbook volumes
|
||||
- `finalize_metrics() -> None`: Emit final metrics bar at processing end
|
||||
|
||||
### Properties
|
||||
- `cvd_cumulative: float`: Current cumulative volume delta value
|
||||
|
||||
### Private Methods
|
||||
- `_emit_metrics_bar() -> None`: Emit current metrics to visualization layer
|
||||
|
||||
## Usage Examples
|
||||
|
||||
```python
|
||||
from metrics_calculator import MetricsCalculator
|
||||
|
||||
# Initialize calculator
|
||||
calc = MetricsCalculator(window_seconds=60, emit_every_n_updates=25)
|
||||
|
||||
# Update CVD from trades
|
||||
calc.update_cvd_from_trade("buy", 1.5) # +1.5 CVD
|
||||
calc.update_cvd_from_trade("sell", 1.0) # -1.0 CVD, net +0.5
|
||||
|
||||
# Update OBI from orderbook
|
||||
total_bids, total_asks = 150.0, 120.0
|
||||
calc.update_obi_metrics("1640995200000", total_bids, total_asks)
|
||||
|
||||
# Access current CVD
|
||||
current_cvd = calc.cvd_cumulative # 0.5
|
||||
|
||||
# Finalize at end of processing
|
||||
calc.finalize_metrics()
|
||||
```
|
||||
|
||||
## Metrics Definitions
|
||||
|
||||
### Cumulative Volume Delta (CVD)
|
||||
- **Formula**: CVD = Σ(buy_volume - sell_volume)
|
||||
- **Interpretation**: Positive = more buying pressure, Negative = more selling pressure
|
||||
- **Accumulation**: Running total across all processed trades
|
||||
- **Update Frequency**: Every trade
|
||||
|
||||
### Order Book Imbalance (OBI)
|
||||
- **Formula**: OBI = total_bid_volume - total_ask_volume
|
||||
- **Interpretation**: Positive = more bid liquidity, Negative = more ask liquidity
|
||||
- **Aggregation**: OHLC-style bars per time window (open, high, low, close)
|
||||
- **Update Frequency**: Throttled per orderbook update
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal
|
||||
- `viz_io.upsert_metric_bar`: Output interface for visualization
|
||||
|
||||
### External
|
||||
- `logging`: Warning messages for unknown trade sides
|
||||
- `typing`: Type annotations
|
||||
|
||||
## Windowed Aggregation
|
||||
|
||||
### OBI Windows
|
||||
- **Window Size**: Configurable via `window_seconds` (default: 60)
|
||||
- **Window Alignment**: Aligned to epoch time boundaries
|
||||
- **OHLC Tracking**: Maintains open, high, low, close values per window
|
||||
- **Rollover**: Automatic window transitions with final bar emission
|
||||
|
||||
### Throttling Mechanism
|
||||
- **Purpose**: Reduce I/O overhead during high-frequency updates
|
||||
- **Trigger**: Every N updates (configurable via `emit_every_n_updates`)
|
||||
- **Behavior**: Emits intermediate updates for real-time visualization
|
||||
- **Final Emission**: Guaranteed on window rollover and finalization
|
||||
|
||||
## State Management
|
||||
|
||||
### CVD State
|
||||
- `cvd_cumulative: float`: Running total across all trades
|
||||
- **Persistence**: Maintained throughout processor lifetime
|
||||
- **Updates**: Incremental addition/subtraction per trade
|
||||
|
||||
### OBI State
|
||||
- `metrics_window_start: int`: Current window start timestamp
|
||||
- `metrics_bar: dict`: Current OBI OHLC values
|
||||
- `_metrics_since_last_emit: int`: Throttling counter
|
||||
|
||||
## Output Format
|
||||
|
||||
### Metrics Bar Structure
|
||||
```python
|
||||
{
|
||||
'obi_open': float, # First OBI value in window
|
||||
'obi_high': float, # Maximum OBI in window
|
||||
'obi_low': float, # Minimum OBI in window
|
||||
'obi_close': float, # Latest OBI value
|
||||
}
|
||||
```
|
||||
|
||||
### Visualization Integration
|
||||
- Emitted via `viz_io.upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close, cvd_value)`
|
||||
- Compatible with existing OHLC visualization infrastructure
|
||||
- Real-time updates during active processing
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Low Memory**: Maintains only current window state
|
||||
- **Throttled I/O**: Configurable update frequency prevents excessive writes
|
||||
- **Efficient Updates**: O(1) operations for trade and OBI updates
|
||||
- **Window Management**: Automatic transitions without manual intervention
|
||||
|
||||
## Configuration
|
||||
|
||||
### Constructor Parameters
|
||||
- `window_seconds: int`: Time window for OBI aggregation (default: 60)
|
||||
- `emit_every_n_updates: int`: Throttling factor for intermediate updates (default: 25)
|
||||
|
||||
### Tuning Guidelines
|
||||
- **Higher throttling**: Reduces I/O load, delays real-time updates
|
||||
- **Lower throttling**: More responsive visualization, higher I/O overhead
|
||||
- **Window size**: Affects granularity of OBI trends (shorter = more detail)
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
uv run pytest test_metrics_calculator.py -v
|
||||
```
|
||||
|
||||
Test coverage includes:
|
||||
- CVD accumulation accuracy across multiple trades
|
||||
- OBI window rollover and OHLC tracking
|
||||
- Throttling behavior verification
|
||||
- Edge cases (unknown trade sides, empty windows)
|
||||
- Integration with visualization output
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- CVD calculation assumes binary buy/sell classification
|
||||
- No support for partial fills or complex order types
|
||||
- OBI calculation treats all liquidity equally (no price weighting)
|
||||
- Window boundaries aligned to absolute timestamps (no sliding windows)
|
||||
122
docs/modules/ohlc_processor.md
Normal file
122
docs/modules/ohlc_processor.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Module: ohlc_processor
|
||||
|
||||
## Purpose
|
||||
The `ohlc_processor` module serves as the main coordinator for trade data processing, orchestrating OHLC aggregation, orderbook management, and metrics calculation. It has been refactored into a modular architecture using composition with specialized helper modules.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Classes
|
||||
- `OHLCProcessor(window_seconds: int = 60, depth_levels_per_side: int = 50)`: Main orchestrator class that coordinates trade processing using composition
|
||||
|
||||
### Methods
|
||||
- `process_trades(trades: list[tuple]) -> None`: Aggregate trades into OHLC bars and update CVD metrics
|
||||
- `update_orderbook(ob_update: OrderbookUpdate) -> None`: Apply orderbook updates and calculate OBI metrics
|
||||
- `finalize() -> None`: Emit final OHLC bar and metrics data
|
||||
- `cvd_cumulative` (property): Access to cumulative volume delta value
|
||||
|
||||
### Composed Modules
|
||||
- `OrderbookManager`: Handles in-memory orderbook state and depth snapshots
|
||||
- `MetricsCalculator`: Manages OBI and CVD metric calculations
|
||||
- `level_parser` functions: Parse and normalize orderbook level data
|
||||
|
||||
## Usage Examples
|
||||
|
||||
```python
|
||||
from ohlc_processor import OHLCProcessor
|
||||
from db_interpreter import DBInterpreter
|
||||
|
||||
# Initialize processor with 1-minute windows and 50 depth levels
|
||||
processor = OHLCProcessor(window_seconds=60, depth_levels_per_side=50)
|
||||
|
||||
# Process streaming data
|
||||
for ob_update, trades in DBInterpreter(db_path).stream():
|
||||
# Aggregate trades into OHLC bars
|
||||
processor.process_trades(trades)
|
||||
|
||||
# Update orderbook and emit depth snapshots
|
||||
processor.update_orderbook(ob_update)
|
||||
|
||||
# Finalize processing
|
||||
processor.finalize()
|
||||
```
|
||||
|
||||
### Advanced Configuration
|
||||
```python
|
||||
# Custom window size and depth levels
|
||||
processor = OHLCProcessor(
|
||||
window_seconds=30, # 30-second bars
|
||||
depth_levels_per_side=25 # Top 25 levels per side
|
||||
)
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal Modules
|
||||
- `orderbook_manager.OrderbookManager`: In-memory orderbook state management
|
||||
- `metrics_calculator.MetricsCalculator`: OBI and CVD metrics calculation
|
||||
- `level_parser`: Orderbook level parsing utilities
|
||||
- `viz_io`: JSON output for visualization
|
||||
- `db_interpreter.OrderbookUpdate`: Input data structures
|
||||
|
||||
### External
|
||||
- `typing`: Type annotations
|
||||
- `logging`: Debug and operational logging
|
||||
|
||||
## Modular Architecture
|
||||
|
||||
The processor now follows a clean composition pattern:
|
||||
|
||||
1. **Main Coordinator** (`OHLCProcessor`):
|
||||
- Orchestrates trade and orderbook processing
|
||||
- Maintains OHLC bar state and window management
|
||||
- Delegates specialized tasks to composed modules
|
||||
|
||||
2. **Orderbook Management** (`OrderbookManager`):
|
||||
- Maintains in-memory price→size mappings
|
||||
- Applies partial updates and handles deletions
|
||||
- Provides sorted top-N level extraction
|
||||
|
||||
3. **Metrics Calculation** (`MetricsCalculator`):
|
||||
- Tracks CVD from trade flow (buy/sell volume delta)
|
||||
- Calculates OBI from orderbook volume imbalance
|
||||
- Manages windowed metrics aggregation with throttling
|
||||
|
||||
4. **Level Parsing** (`level_parser` module):
|
||||
- Normalizes JSON and Python literal level representations
|
||||
- Handles zero-size levels for orderbook deletions
|
||||
- Provides robust error handling for malformed data
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Throttled Updates**: Prevents excessive I/O during high-frequency periods
|
||||
- **Memory Efficient**: Maintains only current window and top-N depth levels
|
||||
- **Incremental Processing**: Applies only changed orderbook levels
|
||||
- **Atomic Operations**: Thread-safe updates to shared data structures
|
||||
|
||||
## Testing
|
||||
|
||||
Run module tests:
|
||||
```bash
|
||||
uv run pytest test_ohlc_processor.py -v
|
||||
```
|
||||
|
||||
Test coverage includes:
|
||||
- OHLC calculation accuracy across window boundaries
|
||||
- Volume accumulation correctness
|
||||
- High/low price tracking
|
||||
- Orderbook update application
|
||||
- Depth snapshot generation
|
||||
- OBI metric calculation
|
||||
|
||||
## Known Issues
|
||||
|
||||
- Orderbook level parsing assumes well-formed JSON or Python literals
|
||||
- Memory usage scales with number of active price levels
|
||||
- Clock skew between trades and orderbook updates not handled
|
||||
|
||||
## Configuration Options
|
||||
|
||||
- `window_seconds`: Time window size for OHLC aggregation (default: 60)
|
||||
- `depth_levels_per_side`: Number of top price levels to maintain (default: 50)
|
||||
- `UPSERT_THROTTLE_MS`: Minimum interval between upsert operations (internal)
|
||||
- `DEPTH_EMIT_THROTTLE_MS`: Minimum interval between depth emissions (internal)
|
||||
121
docs/modules/orderbook_manager.md
Normal file
121
docs/modules/orderbook_manager.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Module: orderbook_manager
|
||||
|
||||
## Purpose
|
||||
The `orderbook_manager` module provides in-memory orderbook state management with partial update capabilities. It maintains separate bid and ask sides and supports efficient top-level extraction for visualization.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Classes
|
||||
- `OrderbookManager(depth_levels_per_side: int = 50)`: Main orderbook state manager
|
||||
|
||||
### Methods
|
||||
- `apply_updates(bids_updates: List[Tuple[float, float]], asks_updates: List[Tuple[float, float]]) -> None`: Apply partial updates to both sides
|
||||
- `get_total_volume() -> Tuple[float, float]`: Get total bid and ask volumes
|
||||
- `get_top_levels() -> Tuple[List[List[float]], List[List[float]]]`: Get sorted top levels for both sides
|
||||
|
||||
### Private Methods
|
||||
- `_apply_partial_updates(side_map: Dict[float, float], updates: List[Tuple[float, float]]) -> None`: Apply updates to one side
|
||||
- `_build_top_levels(side_map: Dict[float, float], limit: int, reverse: bool) -> List[List[float]]`: Extract sorted top levels
|
||||
|
||||
## Usage Examples
|
||||
|
||||
```python
|
||||
from orderbook_manager import OrderbookManager
|
||||
|
||||
# Initialize manager
|
||||
manager = OrderbookManager(depth_levels_per_side=25)
|
||||
|
||||
# Apply orderbook updates
|
||||
bids = [(50000.0, 1.5), (49999.0, 2.0)]
|
||||
asks = [(50001.0, 1.2), (50002.0, 0.8)]
|
||||
manager.apply_updates(bids, asks)
|
||||
|
||||
# Get volume totals for OBI calculation
|
||||
total_bids, total_asks = manager.get_total_volume()
|
||||
obi = total_bids - total_asks
|
||||
|
||||
# Get top levels for depth visualization
|
||||
bids_sorted, asks_sorted = manager.get_top_levels()
|
||||
|
||||
# Handle deletions (size = 0)
|
||||
deletions = [(50000.0, 0.0)] # Remove price level
|
||||
manager.apply_updates(deletions, [])
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### External
|
||||
- `typing`: Type annotations for Dict, List, Tuple
|
||||
|
||||
## State Management
|
||||
|
||||
### Internal State
|
||||
- `_book_bids: Dict[float, float]`: Price → size mapping for bid side
|
||||
- `_book_asks: Dict[float, float]`: Price → size mapping for ask side
|
||||
- `depth_levels_per_side: int`: Configuration for top-N extraction
|
||||
|
||||
### Update Semantics
|
||||
- **Size = 0**: Remove price level (deletion)
|
||||
- **Size > 0**: Upsert price level with new size
|
||||
- **Size < 0**: Ignored (invalid update)
|
||||
|
||||
### Sorting Behavior
|
||||
- **Bids**: Descending by price (highest price first)
|
||||
- **Asks**: Ascending by price (lowest price first)
|
||||
- **Top-N**: Limited by `depth_levels_per_side` parameter
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Memory Efficient**: Only stores non-zero price levels
|
||||
- **Fast Updates**: O(1) upsert/delete operations using dict
|
||||
- **Efficient Sorting**: Only sorts when extracting top levels
|
||||
- **Bounded Output**: Limits result size for visualization performance
|
||||
|
||||
## Use Cases
|
||||
|
||||
### OBI Calculation
|
||||
```python
|
||||
total_bids, total_asks = manager.get_total_volume()
|
||||
order_book_imbalance = total_bids - total_asks
|
||||
```
|
||||
|
||||
### Depth Visualization
|
||||
```python
|
||||
bids, asks = manager.get_top_levels()
|
||||
depth_payload = {"bids": bids, "asks": asks}
|
||||
```
|
||||
|
||||
### Incremental Updates
|
||||
```python
|
||||
# Typical orderbook update cycle
|
||||
updates = parse_orderbook_changes(raw_data)
|
||||
manager.apply_updates(updates['bids'], updates['asks'])
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
uv run pytest test_orderbook_manager.py -v
|
||||
```
|
||||
|
||||
Test coverage includes:
|
||||
- Partial update application correctness
|
||||
- Deletion handling (size = 0)
|
||||
- Volume calculation accuracy
|
||||
- Top-level sorting and limiting
|
||||
- Edge cases (empty books, single levels)
|
||||
- Performance with large orderbooks
|
||||
|
||||
## Configuration
|
||||
|
||||
- `depth_levels_per_side`: Controls output size for visualization (default: 50)
|
||||
- Affects memory usage and sorting performance
|
||||
- Higher values provide more market depth detail
|
||||
- Lower values improve processing speed
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- No built-in validation of price/size values
|
||||
- Memory usage scales with number of unique price levels
|
||||
- No historical state tracking (current snapshot only)
|
||||
- No support for spread calculation or market data statistics
|
||||
155
docs/modules/viz_io.md
Normal file
155
docs/modules/viz_io.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# Module: viz_io
|
||||
|
||||
## Purpose
|
||||
The `viz_io` module provides atomic inter-process communication (IPC) between the data processing pipeline and the visualization frontend. It manages JSON file-based data exchange with atomic writes to prevent race conditions and data corruption.
|
||||
|
||||
## Public Interface
|
||||
|
||||
### Functions
|
||||
- `add_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Append new OHLC bar to rolling dataset
|
||||
- `upsert_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Update existing bar or append new one
|
||||
- `clear_data()`: Reset OHLC dataset to empty state
|
||||
- `add_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Append OBI metric bar
|
||||
- `upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Update existing OBI bar or append new one
|
||||
- `clear_metrics()`: Reset metrics dataset to empty state
|
||||
- `set_depth_data(bids, asks)`: Update current orderbook depth snapshot
|
||||
|
||||
### Constants
|
||||
- `DATA_FILE`: Path to OHLC data JSON file
|
||||
- `DEPTH_FILE`: Path to depth data JSON file
|
||||
- `METRICS_FILE`: Path to metrics data JSON file
|
||||
- `MAX_BARS`: Maximum number of bars to retain (1000)
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic OHLC Operations
|
||||
```python
|
||||
import viz_io
|
||||
|
||||
# Add a new OHLC bar
|
||||
viz_io.add_ohlc_bar(
|
||||
timestamp=1640995200000, # Unix timestamp in milliseconds
|
||||
open_price=50000.0,
|
||||
high_price=50100.0,
|
||||
low_price=49900.0,
|
||||
close_price=50050.0,
|
||||
volume=125.5
|
||||
)
|
||||
|
||||
# Update the current bar (if timestamp matches) or add new one
|
||||
viz_io.upsert_ohlc_bar(
|
||||
timestamp=1640995200000,
|
||||
open_price=50000.0,
|
||||
high_price=50150.0, # Updated high
|
||||
low_price=49850.0, # Updated low
|
||||
close_price=50075.0, # Updated close
|
||||
volume=130.2 # Updated volume
|
||||
)
|
||||
```
|
||||
|
||||
### Orderbook Depth Management
|
||||
```python
|
||||
# Set current depth snapshot
|
||||
bids = [[49990.0, 1.5], [49985.0, 2.1], [49980.0, 0.8]]
|
||||
asks = [[50010.0, 1.2], [50015.0, 1.8], [50020.0, 2.5]]
|
||||
|
||||
viz_io.set_depth_data(bids, asks)
|
||||
```
|
||||
|
||||
### Metrics Operations
|
||||
```python
|
||||
# Add Order Book Imbalance metrics
|
||||
viz_io.add_metric_bar(
|
||||
timestamp=1640995200000,
|
||||
obi_open=0.15,
|
||||
obi_high=0.22,
|
||||
obi_low=0.08,
|
||||
obi_close=0.18
|
||||
)
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
### Internal
|
||||
- None (standalone utility module)
|
||||
|
||||
### External
|
||||
- `json`: JSON serialization/deserialization
|
||||
- `pathlib`: File path handling
|
||||
- `typing`: Type annotations
|
||||
- `tempfile`: Atomic write operations
|
||||
|
||||
## Data Formats
|
||||
|
||||
### OHLC Data (`ohlc_data.json`)
|
||||
```json
|
||||
[
|
||||
[1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
|
||||
[1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
|
||||
]
|
||||
```
|
||||
Format: `[timestamp, open, high, low, close, volume]`
|
||||
|
||||
### Depth Data (`depth_data.json`)
|
||||
```json
|
||||
{
|
||||
"bids": [[49990.0, 1.5], [49985.0, 2.1]],
|
||||
"asks": [[50010.0, 1.2], [50015.0, 1.8]]
|
||||
}
|
||||
```
|
||||
Format: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`
|
||||
|
||||
### Metrics Data (`metrics_data.json`)
|
||||
```json
|
||||
[
|
||||
[1640995200000, 0.15, 0.22, 0.08, 0.18],
|
||||
[1640995260000, 0.18, 0.25, 0.12, 0.20]
|
||||
]
|
||||
```
|
||||
Format: `[timestamp, obi_open, obi_high, obi_low, obi_close]`
|
||||
|
||||
## Atomic Write Operations
|
||||
|
||||
All write operations use atomic file replacement to prevent partial reads:
|
||||
|
||||
1. Write data to temporary file
|
||||
2. Flush and sync to disk
|
||||
3. Atomically rename temporary file to target file
|
||||
|
||||
This ensures the visualization frontend always reads complete, valid JSON data.
|
||||
|
||||
## Performance Characteristics
|
||||
|
||||
- **Bounded Memory**: OHLC and metrics datasets limited to 1000 bars max
|
||||
- **Atomic Operations**: No partial reads possible during writes
|
||||
- **Rolling Window**: Automatic trimming of old data maintains constant memory usage
|
||||
- **Fast Lookups**: Timestamp-based upsert operations use list scanning (acceptable for 1000 items)
|
||||
|
||||
## Testing
|
||||
|
||||
Run module tests:
|
||||
```bash
|
||||
uv run pytest test_viz_io.py -v
|
||||
```
|
||||
|
||||
Test coverage includes:
|
||||
- Atomic write operations
|
||||
- Data format validation
|
||||
- Rolling window behavior
|
||||
- Upsert logic correctness
|
||||
- File corruption prevention
|
||||
- Concurrent read/write scenarios
|
||||
|
||||
## Known Issues
|
||||
|
||||
- File I/O may block briefly during atomic writes
|
||||
- JSON parsing errors not propagated to callers
|
||||
- Limited to 1000 bars maximum (configurable via MAX_BARS)
|
||||
- No compression for large datasets
|
||||
|
||||
## Thread Safety
|
||||
|
||||
All operations are thread-safe for single writer, multiple reader scenarios:
|
||||
- Writer: Data processing pipeline (single thread)
|
||||
- Readers: Visualization frontend (polling)
|
||||
- Atomic file operations prevent corruption during concurrent access
|
||||
Reference in New Issue
Block a user