WIP UI rework with qt6

This commit is contained in:
2025-09-10 15:39:16 +08:00
parent 36385af6f3
commit ebf232317c
63 changed files with 4005 additions and 5221 deletions

View File

@@ -1,550 +1,23 @@
# API Documentation
# API Documentation (Current Implementation)
## Overview
This document provides comprehensive API documentation for the Orderflow Backtest System, including public interfaces, data models, and usage examples.
This document describes the public interfaces of the current system: SQLite streaming, OHLC/depth aggregation, JSON-based IPC, and the Dash visualizer. Metrics (OBI/CVD), repository/storage layers, and strategy APIs are not part of the current implementation.
## Core Data Models
## Input Database Schema (Required)
### OrderbookLevel
Represents a single price level in the orderbook.
```python
@dataclass(slots=True)
class OrderbookLevel:
price: float # Price level
size: float # Total size at this price
liquidation_count: int # Number of liquidations
order_count: int # Number of resting orders
```
**Example:**
```python
level = OrderbookLevel(
price=50000.0,
size=10.5,
liquidation_count=0,
order_count=3
)
```
### Trade
Represents a single trade execution.
```python
@dataclass(slots=True)
class Trade:
id: int # Unique trade identifier
trade_id: float # Exchange trade ID
price: float # Execution price
size: float # Trade size
side: str # "buy" or "sell"
timestamp: int # Unix timestamp
```
**Example:**
```python
trade = Trade(
id=1,
trade_id=123456.0,
price=50000.0,
size=0.5,
side="buy",
timestamp=1640995200
)
```
### BookSnapshot
Complete orderbook state at a specific timestamp.
```python
@dataclass
class BookSnapshot:
id: int # Snapshot identifier
timestamp: int # Unix timestamp
bids: Dict[float, OrderbookLevel] # Bid side levels
asks: Dict[float, OrderbookLevel] # Ask side levels
trades: List[Trade] # Associated trades
```
**Example:**
```python
snapshot = BookSnapshot(
id=1,
timestamp=1640995200,
bids={
50000.0: OrderbookLevel(50000.0, 10.0, 0, 1),
49999.0: OrderbookLevel(49999.0, 5.0, 0, 1)
},
asks={
50001.0: OrderbookLevel(50001.0, 3.0, 0, 1),
50002.0: OrderbookLevel(50002.0, 2.0, 0, 1)
},
trades=[]
)
```
### Metric
Calculated financial metrics for a snapshot.
```python
@dataclass(slots=True)
class Metric:
snapshot_id: int # Reference to source snapshot
timestamp: int # Unix timestamp
obi: float # Order Book Imbalance [-1, 1]
cvd: float # Cumulative Volume Delta
best_bid: float | None # Best bid price
best_ask: float | None # Best ask price
```
**Example:**
```python
metric = Metric(
snapshot_id=1,
timestamp=1640995200,
obi=0.333,
cvd=150.5,
best_bid=50000.0,
best_ask=50001.0
)
```
## MetricCalculator API
Static class providing financial metric calculations.
### calculate_obi()
```python
@staticmethod
def calculate_obi(snapshot: BookSnapshot) -> float:
"""
Calculate Order Book Imbalance.
Formula: OBI = (Vb - Va) / (Vb + Va)
Args:
snapshot: BookSnapshot with bids and asks
Returns:
float: OBI value between -1 and 1
Example:
>>> obi = MetricCalculator.calculate_obi(snapshot)
>>> print(f"OBI: {obi:.3f}")
OBI: 0.333
"""
```
### calculate_volume_delta()
```python
@staticmethod
def calculate_volume_delta(trades: List[Trade]) -> float:
"""
Calculate Volume Delta for trades.
Formula: VD = Buy Volume - Sell Volume
Args:
trades: List of Trade objects
Returns:
float: Net volume delta
Example:
>>> vd = MetricCalculator.calculate_volume_delta(trades)
>>> print(f"Volume Delta: {vd}")
Volume Delta: 7.5
"""
```
### calculate_cvd()
```python
@staticmethod
def calculate_cvd(previous_cvd: float, volume_delta: float) -> float:
"""
Calculate Cumulative Volume Delta.
Formula: CVD_t = CVD_{t-1} + VD_t
Args:
previous_cvd: Previous CVD value
volume_delta: Current volume delta
Returns:
float: New CVD value
Example:
>>> cvd = MetricCalculator.calculate_cvd(100.0, 7.5)
>>> print(f"CVD: {cvd}")
CVD: 107.5
"""
```
### get_best_bid_ask()
```python
@staticmethod
def get_best_bid_ask(snapshot: BookSnapshot) -> tuple[float | None, float | None]:
"""
Extract best bid and ask prices.
Args:
snapshot: BookSnapshot with bids and asks
Returns:
tuple: (best_bid, best_ask) or (None, None)
Example:
>>> best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
>>> print(f"Spread: {best_ask - best_bid}")
Spread: 1.0
"""
```
## Repository APIs
### SQLiteOrderflowRepository
Repository for orderbook, trades data and metrics.
#### connect()
```python
def connect(self) -> sqlite3.Connection:
"""
Create optimized SQLite connection.
Returns:
sqlite3.Connection: Configured database connection
Example:
>>> repo = SQLiteOrderflowRepository(db_path)
>>> with repo.connect() as conn:
... # Use connection
"""
```
#### load_trades_by_timestamp()
```python
def load_trades_by_timestamp(self, conn: sqlite3.Connection) -> Dict[int, List[Trade]]:
"""
Load all trades grouped by timestamp.
Args:
conn: Active database connection
Returns:
Dict[int, List[Trade]]: Trades grouped by timestamp
Example:
>>> trades_by_ts = repo.load_trades_by_timestamp(conn)
>>> trades_at_1000 = trades_by_ts.get(1000, [])
"""
```
#### iterate_book_rows()
```python
def iterate_book_rows(self, conn: sqlite3.Connection) -> Iterator[Tuple[int, str, str, int]]:
"""
Memory-efficient iteration over orderbook rows.
Args:
conn: Active database connection
Yields:
Tuple[int, str, str, int]: (id, bids_text, asks_text, timestamp)
Example:
>>> for row_id, bids, asks, ts in repo.iterate_book_rows(conn):
... # Process row
"""
```
#### create_metrics_table()
```python
def create_metrics_table(self, conn: sqlite3.Connection) -> None:
"""
Create metrics table with indexes.
Args:
conn: Active database connection
Raises:
sqlite3.Error: If table creation fails
Example:
>>> repo.create_metrics_table(conn)
>>> # Metrics table now available
"""
```
#### insert_metrics_batch()
```python
def insert_metrics_batch(self, conn: sqlite3.Connection, metrics: List[Metric]) -> None:
"""
Insert metrics in batch for performance.
Args:
conn: Active database connection
metrics: List of Metric objects to insert
Example:
>>> metrics = [Metric(...), Metric(...)]
>>> repo.insert_metrics_batch(conn, metrics)
>>> conn.commit()
"""
```
#### load_metrics_by_timerange()
```python
def load_metrics_by_timerange(
self,
conn: sqlite3.Connection,
start_timestamp: int,
end_timestamp: int
) -> List[Metric]:
"""
Load metrics within time range.
Args:
conn: Active database connection
start_timestamp: Start time (inclusive)
end_timestamp: End time (inclusive)
Returns:
List[Metric]: Metrics ordered by timestamp
Example:
>>> metrics = repo.load_metrics_by_timerange(conn, 1000, 2000)
>>> print(f"Loaded {len(metrics)} metrics")
"""
```
## Storage API
### Storage
High-level data processing orchestrator.
#### __init__()
```python
def __init__(self, instrument: str) -> None:
"""
Initialize storage for specific instrument.
Args:
instrument: Trading pair identifier (e.g., "BTC-USDT")
Example:
>>> storage = Storage("BTC-USDT")
"""
```
#### build_booktick_from_db()
```python
def build_booktick_from_db(self, db_path: Path, db_date: datetime) -> None:
"""
Process database and calculate metrics.
This is the main processing pipeline that:
1. Loads orderbook and trades data
2. Calculates OBI and CVD metrics per snapshot
3. Stores metrics in database
4. Populates book with snapshots
Args:
db_path: Path to SQLite database file
db_date: Date for this database (informational)
Example:
>>> storage.build_booktick_from_db(Path("data.db"), datetime.now())
>>> print(f"Processed {len(storage.book.snapshots)} snapshots")
"""
```
## Strategy API
### DefaultStrategy
Trading strategy with metrics analysis capabilities.
#### __init__()
```python
def __init__(self, instrument: str) -> None:
"""
Initialize strategy for instrument.
Args:
instrument: Trading pair identifier
Example:
>>> strategy = DefaultStrategy("BTC-USDT")
"""
```
#### set_db_path()
```python
def set_db_path(self, db_path: Path) -> None:
"""
Configure database path for metrics access.
Args:
db_path: Path to database with metrics
Example:
>>> strategy.set_db_path(Path("data.db"))
"""
```
#### load_stored_metrics()
```python
def load_stored_metrics(self, start_timestamp: int, end_timestamp: int) -> List[Metric]:
"""
Load stored metrics for analysis.
Args:
start_timestamp: Start of time range
end_timestamp: End of time range
Returns:
List[Metric]: Metrics for specified range
Example:
>>> metrics = strategy.load_stored_metrics(1000, 2000)
>>> latest_obi = metrics[-1].obi
"""
```
#### get_metrics_summary()
```python
def get_metrics_summary(self, metrics: List[Metric]) -> dict:
"""
Generate statistical summary of metrics.
Args:
metrics: List of metrics to analyze
Returns:
dict: Statistical summary with keys:
- obi_min, obi_max, obi_avg
- cvd_start, cvd_end, cvd_change
- total_snapshots
Example:
>>> summary = strategy.get_metrics_summary(metrics)
>>> print(f"OBI range: {summary['obi_min']:.3f} to {summary['obi_max']:.3f}")
"""
```
## Visualizer API
### Visualizer
Multi-chart visualization system.
#### __init__()
```python
def __init__(self, window_seconds: int = 60, max_bars: int = 200) -> None:
"""
Initialize visualizer with chart parameters.
Args:
window_seconds: OHLC aggregation window
max_bars: Maximum bars to display
Example:
>>> visualizer = Visualizer(window_seconds=300, max_bars=1000)
"""
```
#### set_db_path()
```python
def set_db_path(self, db_path: Path) -> None:
"""
Configure database path for metrics loading.
Args:
db_path: Path to database with metrics
Example:
>>> visualizer.set_db_path(Path("data.db"))
"""
```
#### update_from_book()
```python
def update_from_book(self, book: Book) -> None:
"""
Update charts with book data and stored metrics.
Creates 4-subplot layout:
1. OHLC candlesticks
2. Volume bars
3. OBI line chart
4. CVD line chart
Args:
book: Book with snapshots for OHLC calculation
Example:
>>> visualizer.update_from_book(storage.book)
>>> # Charts updated with latest data
"""
```
#### show()
```python
def show() -> None:
"""
Display interactive chart window.
Example:
>>> visualizer.show()
>>> # Interactive Qt5 window opens
"""
```
## Database Schema
### Input Tables (Required)
These tables must exist in the SQLite database files:
#### book table
### book table
```sql
CREATE TABLE book (
id INTEGER PRIMARY KEY,
instrument TEXT,
bids TEXT NOT NULL, -- JSON array: [[price, size, liq_count, order_count], ...]
asks TEXT NOT NULL, -- JSON array: [[price, size, liq_count, order_count], ...]
bids TEXT NOT NULL, -- Python-literal: [[price, size, ...], ...]
asks TEXT NOT NULL, -- Python-literal: [[price, size, ...], ...]
timestamp TEXT NOT NULL
);
```
#### trades table
### trades table
```sql
CREATE TABLE trades (
id INTEGER PRIMARY KEY,
@@ -557,129 +30,122 @@ CREATE TABLE trades (
);
```
### Output Table (Auto-created)
## Data Access: db_interpreter.py
This table is automatically created by the system:
### Classes
- `OrderbookLevel` (dataclass): represents a price level.
- `OrderbookUpdate`: windowed book update with `bids`, `asks`, `timestamp`, `end_timestamp`.
#### metrics table
```sql
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER NOT NULL,
timestamp TEXT NOT NULL,
obi REAL NOT NULL, -- Order Book Imbalance [-1, 1]
cvd REAL NOT NULL, -- Cumulative Volume Delta
best_bid REAL, -- Best bid price
best_ask REAL, -- Best ask price
FOREIGN KEY (snapshot_id) REFERENCES book(id)
);
### DBInterpreter
```python
class DBInterpreter:
def __init__(self, db_path: Path): ...
-- Performance indexes
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
def stream(self) -> Iterator[tuple[OrderbookUpdate, list[tuple]]]:
"""
Stream orderbook rows with one-row lookahead and trades in timestamp order.
Yields pairs of (OrderbookUpdate, trades_in_window), where each trade tuple is:
(id, trade_id, price, size, side, timestamp_ms) and timestamp_ms ∈ [timestamp, end_timestamp).
"""
```
- Read-only SQLite connection with PRAGMA tuning (immutable, query_only, mmap, cache).
- Batch sizes: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`.
## Processing: ohlc_processor.py
### OHLCProcessor
```python
class OHLCProcessor:
def __init__(self, window_seconds: int = 60, depth_levels_per_side: int = 50): ...
def process_trades(self, trades: list[tuple]) -> None:
"""Aggregate trades into OHLC bars per window; throttled upserts for UI responsiveness."""
def update_orderbook(self, ob_update: OrderbookUpdate) -> None:
"""Maintain in-memory price→size maps, apply partial updates, and emit top-N depth snapshots periodically."""
def finalize(self) -> None:
"""Emit the last OHLC bar if present."""
```
- Internal helpers for parsing levels from JSON or Python-literal strings and for applying deletions (size==0).
## Inter-Process Communication: viz_io.py
### Files
- `ohlc_data.json`: rolling array of OHLC bars (max 1000).
- `depth_data.json`: latest depth snapshot (bids/asks), top-N per side.
- `metrics_data.json`: rolling array of OBI OHLC bars (max 1000).
### Functions
```python
def add_ohlc_bar(timestamp: int, open_price: float, high_price: float, low_price: float, close_price: float, volume: float = 0.0) -> None: ...
def upsert_ohlc_bar(timestamp: int, open_price: float, high_price: float, low_price: float, close_price: float, volume: float = 0.0) -> None: ...
def clear_data() -> None: ...
def add_metric_bar(timestamp: int, obi_open: float, obi_high: float, obi_low: float, obi_close: float) -> None: ...
def upsert_metric_bar(timestamp: int, obi_open: float, obi_high: float, obi_low: float, obi_close: float) -> None: ...
def clear_metrics() -> None: ...
```
- Atomic writes via temp file replace to prevent partial reads.
## Visualization: app.py (Dash)
- Three visuals: OHLC+Volume and Depth (cumulative) with Plotly dark theme, plus an OBI candlestick subplot beneath Volume.
- Polling interval: 500 ms. Tolerates JSON decode races using cached last values.
### Callback Contract
```python
@app.callback(
[Output('ohlc-chart', 'figure'), Output('depth-chart', 'figure')],
[Input('interval-update', 'n_intervals')]
)
```
- Reads `ohlc_data.json` (list of `[ts, open, high, low, close, volume]`).
- Reads `depth_data.json` (`{"bids": [[price, size], ...], "asks": [[price, size], ...]}`).
- Reads `metrics_data.json` (list of `[ts, obi_o, obi_h, obi_l, obi_c]`).
## CLI Orchestration: main.py
### Typer Entry Point
```python
def main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None:
"""Stream DBs, process OHLC/depth, and launch Dash visualizer in a separate process."""
```
- Discovers databases under `../data/OKX` matching the instrument and date range.
- Launches UI: `uv run python app.py`.
## Usage Examples
### Complete Processing Workflow
```python
from pathlib import Path
from datetime import datetime
from storage import Storage
from strategies import DefaultStrategy
from visualizer import Visualizer
# Initialize components
storage = Storage("BTC-USDT")
strategy = DefaultStrategy("BTC-USDT")
visualizer = Visualizer(window_seconds=60, max_bars=500)
# Process database
db_path = Path("data/BTC-USDT-25-06-09.db")
strategy.set_db_path(db_path)
visualizer.set_db_path(db_path)
# Build book and calculate metrics
storage.build_booktick_from_db(db_path, datetime.now())
# Analyze metrics
strategy.on_booktick(storage.book)
# Update visualization
visualizer.update_from_book(storage.book)
visualizer.show()
### Run processing + UI
```bash
uv run python main.py BTC-USDT 2025-07-01 2025-08-01 --window-seconds 60
# Open http://localhost:8050
```
### Metrics Analysis
### Process trades and update depth in a loop (conceptual)
```python
# Load and analyze stored metrics
strategy = DefaultStrategy("BTC-USDT")
strategy.set_db_path(Path("data.db"))
from db_interpreter import DBInterpreter
from ohlc_processor import OHLCProcessor
# Get metrics for specific time range
metrics = strategy.load_stored_metrics(1640995200, 1640998800)
# Analyze metrics
summary = strategy.get_metrics_summary(metrics)
print(f"OBI Range: {summary['obi_min']:.3f} to {summary['obi_max']:.3f}")
print(f"CVD Change: {summary['cvd_change']:.1f}")
# Find significant imbalances
significant_obi = [m for m in metrics if abs(m.obi) > 0.2]
print(f"Found {len(significant_obi)} snapshots with >20% imbalance")
```
### Custom Metric Calculations
```python
from models import MetricCalculator
# Calculate metrics for single snapshot
obi = MetricCalculator.calculate_obi(snapshot)
best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
# Calculate CVD over time
cvd = 0.0
for trades in trades_by_timestamp.values():
volume_delta = MetricCalculator.calculate_volume_delta(trades)
cvd = MetricCalculator.calculate_cvd(cvd, volume_delta)
print(f"CVD: {cvd:.1f}")
processor = OHLCProcessor(window_seconds=60)
for ob_update, trades in DBInterpreter(db_path).stream():
processor.process_trades(trades)
processor.update_orderbook(ob_update)
processor.finalize()
```
## Error Handling
- Reader/Writer coordination via atomic JSON prevents partial reads.
- Visualizer caches last valid data if JSON decoding fails mid-write; logs warnings.
- Visualizer start failures do not stop processing; logs error and continues.
### Common Error Scenarios
#### Database Connection Issues
```python
try:
repo = SQLiteOrderflowRepository(db_path)
with repo.connect() as conn:
metrics = repo.load_metrics_by_timerange(conn, start, end)
except sqlite3.Error as e:
logging.error(f"Database error: {e}")
metrics = [] # Fallback to empty list
```
#### Missing Metrics Table
```python
repo = SQLiteOrderflowRepository(db_path)
with repo.connect() as conn:
if not repo.table_exists(conn, "metrics"):
repo.create_metrics_table(conn)
logging.info("Created metrics table")
```
#### Empty Data Handling
```python
# All methods handle empty data gracefully
obi = MetricCalculator.calculate_obi(empty_snapshot) # Returns 0.0
vd = MetricCalculator.calculate_volume_delta([]) # Returns 0.0
summary = strategy.get_metrics_summary([]) # Returns {}
```
---
This API documentation provides complete coverage of the public interfaces for the Orderflow Backtest System. For implementation details and architecture information, see the additional documentation in the `docs/` directory.
## Notes
- Metrics computation includes simplified OBI (Order Book Imbalance) calculated as bid_total - ask_total. Repository/storage layers and strategy APIs are intentionally kept minimal.

View File

@@ -5,42 +5,52 @@ All notable changes to the Orderflow Backtest System are documented in this file
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [2.0.0] - 2024-Current
## [Unreleased]
### Added
- **OBI Metrics Calculation**: Order Book Imbalance calculation with formula `(Vb - Va) / (Vb + Va)`
- **CVD Metrics Calculation**: Cumulative Volume Delta with incremental calculation and reset functionality
- **Persistent Metrics Storage**: SQLite-based storage for calculated metrics to avoid recalculation
- **Memory Optimization**: >70% reduction in peak memory usage through streaming processing
- **Enhanced Visualization**: Multi-subplot charts with OHLC, Volume, OBI, and CVD displays
- **MetricCalculator Class**: Static methods for financial metrics computation
- **Batch Processing**: High-performance batch inserts (1000 records per operation)
- **Time-Range Queries**: Efficient metrics retrieval for specified time periods
- **Strategy Enhancement**: Metrics analysis capabilities in `DefaultStrategy`
- **Comprehensive Testing**: 27 tests across 6 test files with full integration coverage
- Comprehensive documentation structure with module-specific guides
- Architecture Decision Records (ADRs) for major technical decisions
- CONTRIBUTING.md with development guidelines and standards
- Enhanced module documentation in `docs/modules/` directory
- Dependency documentation with security and performance considerations
### Changed
- **Storage Architecture**: Modified `Storage.build_booktick_from_db()` to integrate metrics calculation
- **Visualization Separation**: Moved visualization from strategy to main application for better separation of concerns
- **Strategy Interface**: Simplified `DefaultStrategy` constructor (removed `enable_visualization` parameter)
- **Main Application Flow**: Enhanced orchestration with per-database visualization updates
- **Database Schema**: Auto-creation of metrics table with proper indexes and foreign key constraints
- **Memory Management**: Stream processing instead of keeping full snapshot history
- Documentation structure reorganized to follow documentation standards
- Improved code documentation requirements with examples
- Enhanced testing guidelines with coverage requirements
### Improved
- **Performance**: Batch database operations and optimized SQLite PRAGMAs
- **Scalability**: Support for months to years of high-frequency trading data
- **Code Quality**: All functions <50 lines, all files <250 lines
- **Documentation**: Comprehensive module and API documentation
- **Error Handling**: Graceful degradation and comprehensive logging
- **Type Safety**: Full type annotations throughout codebase
## [2.0.0] - 2024-12-Present
### Added
- **Simplified Pipeline Architecture**: Streamlined SQLite → OHLC/Depth → JSON → Dash pipeline
- **JSON-based IPC**: Atomic file-based communication between processor and visualizer
- **Real-time Visualization**: Dash web application with 500ms polling updates
- **OHLC Aggregation**: Configurable time window aggregation with throttled updates
- **Orderbook Depth**: Real-time depth snapshots with top-N level management
- **OBI Metrics**: Order Book Imbalance calculation with candlestick visualization
- **Atomic JSON Operations**: Race-condition-free data exchange via temp files
- **CLI Orchestration**: Typer-based command interface with process management
- **Performance Optimizations**: Batch reading with optimized SQLite PRAGMA settings
### Changed
- **Architecture Simplification**: Removed complex repository/storage layers
- **Data Flow**: Direct streaming from database to visualization via JSON
- **Error Handling**: Graceful degradation with cached data fallbacks
- **Process Management**: Separate visualization process launched automatically
- **Memory Efficiency**: Bounded datasets prevent unlimited memory growth
### Technical Details
- **New Tables**: `metrics` table with indexes on timestamp and snapshot_id
- **New Models**: `Metric` dataclass for calculated values
- **Processing Pipeline**: Snapshot → Calculate → Store → Discard workflow
- **Query Interface**: Time-range based metrics retrieval
- **Visualization Layout**: 4-subplot layout with shared time axis
- **Database Access**: Read-only SQLite with immutable mode and mmap optimization
- **Batch Sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal performance
- **JSON Formats**: Standardized schemas for OHLC, depth, and metrics data
- **Chart Architecture**: Multi-subplot layout with shared time axis
- **IPC Files**: `ohlc_data.json`, `depth_data.json`, `metrics_data.json`
### Removed
- Complex metrics storage and repository patterns
- Strategy framework components
- In-memory snapshot retention
- Multi-database orchestration complexity
## [1.0.0] - Previous Version

View File

@@ -2,162 +2,52 @@
## Current State
The Orderflow Backtest System has successfully implemented a comprehensive OBI (Order Book Imbalance) and CVD (Cumulative Volume Delta) metrics calculation and visualization system. The project is in a production-ready state with full feature completion.
The project implements a modular, efficient orderflow processing pipeline:
- Stream orderflow from SQLite (`DBInterpreter.stream`).
- Process trades and orderbook updates through modular `OHLCProcessor` architecture.
- Exchange data with the UI via atomic JSON files (`viz_io`).
- Render OHLC+Volume, Depth, and Metrics charts with a Dash app (`app.py`).
## Recent Achievements
The system features a clean composition-based architecture with specialized modules for different concerns, providing OBI/CVD metrics alongside OHLC data.
### ✅ Completed Features (Latest Implementation)
- **Metrics Calculation Engine**: Complete OBI and CVD calculation with per-snapshot granularity
- **Persistent Storage**: Metrics stored in SQLite database to avoid recalculation
- **Memory Optimization**: >70% memory usage reduction through efficient data management
- **Visualization System**: Multi-subplot charts (OHLC, Volume, OBI, CVD) with shared time axis
- **Strategy Framework**: Enhanced trading strategy system with metrics analysis
- **Clean Architecture**: Proper separation of concerns between data, analysis, and visualization
## Recent Work
### 📊 System Metrics
- **Performance**: Batch processing of 1000 records per operation
- **Memory**: >70% reduction in peak memory usage
- **Test Coverage**: 27 comprehensive tests across 6 test files
- **Code Quality**: All functions <50 lines, all files <250 lines
- **Modular Refactoring**: Extracted `ohlc_processor.py` into focused modules:
- `level_parser.py`: Orderbook level parsing utilities (85 lines)
- `orderbook_manager.py`: In-memory orderbook state management (90 lines)
- `metrics_calculator.py`: OBI and CVD metrics calculation (112 lines)
- **Architecture Compliance**: Reduced main processor from 440 to 248 lines (250-line target achieved)
- Maintained full backward compatibility and functionality
- Implemented read-only, batched SQLite streaming with PRAGMA tuning.
- Added robust JSON IPC with atomic writes and tolerant UI reads.
- Built a responsive Dash visualization polling at 500ms.
- Unified CLI using Typer, with UV for process management.
## Architecture Decisions
## Conventions
### Key Design Patterns
1. **Repository Pattern**: Clean separation between data access and business logic
2. **Dataclass Models**: Lightweight, type-safe data structures with slots optimization
3. **Batch Processing**: High-performance database operations for large datasets
4. **Separation of Concerns**: Strategy, Storage, and Visualization as independent components
- Python 3.12+, UV for dependency and command execution.
- **Modular Architecture**: Composition over inheritance, single-responsibility modules
- **File Size Limits**: ≤250 lines per file, ≤50 lines per function (enforced)
- Type hints throughout; concise, focused functions and classes.
- Error handling with meaningful logs; avoid bare exceptions.
- Prefer explicit JSON structures for IPC; keep payloads small and bounded.
### Technology Stack
- **Language**: Python 3.12+ with type hints
- **Database**: SQLite with optimized PRAGMAs for performance
- **Package Management**: UV for fast dependency resolution
- **Testing**: Pytest with comprehensive unit and integration tests
- **Visualization**: Matplotlib with Qt5Agg backend
## Priorities
## Current Development Priorities
- Improve configurability: database path discovery, CLI flags for paths and UI options.
- Add tests for `DBInterpreter.stream` and `OHLCProcessor` (run with `uv run pytest`).
- Performance tuning for large DBs while keeping UI responsive.
- Documentation kept in sync with code; architecture reflects current design.
### ✅ Completed (Production Ready)
1. **Core Metrics System**: OBI and CVD calculation infrastructure
2. **Database Integration**: Persistent storage and retrieval system
3. **Visualization Framework**: Multi-chart display with proper time alignment
4. **Memory Optimization**: Efficient processing of large datasets
5. **Code Quality**: Comprehensive testing and documentation
## Roadmap (Future Work)
### 🔄 Maintenance Phase
- **Documentation**: Comprehensive docs completed
- **Testing**: Full test coverage maintained
- **Performance**: Monitoring and optimization as needed
- **Bug Fixes**: Address any issues discovered in production use
- Enhance OBI metrics with additional derived calculations (e.g., normalized OBI).
- Optional repository layer abstraction and a storage orchestrator.
- Extend visualization with additional subplots and interactivity.
- Strategy module for analytics and alerting on derived metrics.
## Known Patterns and Conventions
## Tooling
### Code Style
- **Functions**: Maximum 50 lines, single responsibility
- **Files**: Maximum 250 lines, clear module boundaries
- **Naming**: Descriptive names, no abbreviations except domain terms (OBI, CVD)
- **Error Handling**: Comprehensive try-catch with logging, graceful degradation
### Database Patterns
- **Parameterized Queries**: All SQL uses proper parameterization for security
- **Batch Operations**: Process records in batches of 1000 for performance
- **Indexing**: Strategic indexes on timestamp and foreign key columns
- **Transactions**: Proper transaction boundaries for data consistency
### Testing Patterns
- **Unit Tests**: Each module has comprehensive unit test coverage
- **Integration Tests**: End-to-end workflow testing
- **Mock Objects**: External dependencies mocked for isolated testing
- **Test Data**: Temporary databases with realistic test data
## Integration Points
### External Dependencies
- **SQLite**: Primary data storage (read and write operations)
- **Matplotlib**: Chart rendering and visualization
- **Qt5Agg**: GUI backend for interactive charts
- **Pytest**: Testing framework
### Internal Module Dependencies
```
main.py → storage.py → repositories/ → models.py
→ strategies.py → models.py
→ visualizer.py → repositories/
```
## Performance Characteristics
### Optimizations Implemented
- **Memory Management**: Metrics storage instead of full snapshot retention
- **Database Performance**: Optimized SQLite PRAGMAs and batch processing
- **Query Efficiency**: Indexed queries with proper WHERE clauses
- **Cache Usage**: Price caching in orderbook parser for repeated calculations
### Scalability Notes
- **Dataset Size**: Tested with 600K+ snapshots and 300K+ trades per day
- **Time Range**: Supports months to years of historical data
- **Processing Speed**: ~1000 rows/second with full metrics calculation
- **Storage Overhead**: Metrics table adds <20% to original database size
## Security Considerations
### Implemented Safeguards
- **SQL Injection Prevention**: All queries use parameterized statements
- **Input Validation**: Database paths and table names validated
- **Error Information**: No sensitive data exposed in error messages
- **Access Control**: Database file permissions respected
## Future Considerations
### Potential Enhancements
- **Real-time Processing**: Streaming data support for live trading
- **Additional Metrics**: Volume Profile, Delta Flow, Liquidity metrics
- **Export Capabilities**: CSV/JSON export for external analysis
- **Interactive Charts**: Enhanced user interaction with visualization
- **Configuration System**: Configurable batch sizes and processing parameters
### Scalability Options
- **Database Upgrade**: PostgreSQL for larger datasets if needed
- **Parallel Processing**: Multi-threading for CPU-intensive calculations
- **Caching Layer**: Redis for frequently accessed metrics
- **API Interface**: REST API for external system integration
## Development Environment
### Requirements
- Python 3.12+
- UV package manager
- SQLite database files with required schema
- Qt5 for visualization (Linux/macOS)
### Setup Commands
```bash
# Install dependencies
uv sync
# Run full test suite
uv run pytest
# Process sample data
uv run python main.py BTC-USDT 2025-07-01 2025-08-01
```
## Documentation Status
### ✅ Complete Documentation
- README.md with comprehensive overview
- Module-level documentation for all components
- API documentation with examples
- Architecture decision records
- Code-level documentation with docstrings
### 📊 Quality Metrics
- **Code Coverage**: 27 tests across 6 test files
- **Documentation Coverage**: All public interfaces documented
- **Example Coverage**: Working examples for all major features
- **Error Documentation**: All error conditions documented
---
*Last Updated: Current as of OBI/CVD metrics system completion*
*Next Review: As needed for maintenance or feature additions*
- Package management and commands: UV (e.g., `uv sync`, `uv run ...`).
- Visualization server: Dash on `http://localhost:8050`.
- Linting/testing: Pytest (e.g., `uv run pytest`).

View File

@@ -2,50 +2,25 @@
## Overview
This directory contains comprehensive documentation for the Orderflow Backtest System, a high-performance cryptocurrency trading data analysis platform.
This directory contains documentation for the current Orderflow Backtest System, which streams historical orderflow from SQLite, aggregates OHLC bars, maintains a lightweight depth snapshot, and renders charts via a Dash web application.
## Documentation Structure
### 📚 Main Documentation
- **[CONTEXT.md](./CONTEXT.md)**: Current project state, architecture decisions, and development patterns
- **[architecture.md](./architecture.md)**: System architecture, component relationships, and data flow
- **[API.md](./API.md)**: Public interfaces, classes, and function documentation
### 📦 Module Documentation
- **[modules/metrics.md](./modules/metrics.md)**: OBI and CVD calculation system
- **[modules/storage.md](./modules/storage.md)**: Data processing and persistence layer
- **[modules/visualization.md](./modules/visualization.md)**: Chart rendering and display system
- **[modules/repositories.md](./modules/repositories.md)**: Database access and operations
### 🏗️ Architecture Decisions
- **[decisions/ADR-001-metrics-storage.md](./decisions/ADR-001-metrics-storage.md)**: Persistent metrics storage decision
- **[decisions/ADR-002-visualization-separation.md](./decisions/ADR-002-visualization-separation.md)**: Separation of concerns for visualization
### 📋 Development Guides
- **[CONTRIBUTING.md](./CONTRIBUTING.md)**: Development workflow and contribution guidelines
- **[CHANGELOG.md](./CHANGELOG.md)**: Version history and changes
- `architecture.md`: System architecture, component relationships, and data flow (SQLite → Streaming → OHLC/Depth → JSON → Dash)
- `API.md`: Public interfaces for DB streaming, OHLC/depth processing, JSON IPC, Dash visualization, and CLI
- `CONTEXT.md`: Project state, conventions, and development priorities
- `decisions/`: Architecture decision records
## Quick Navigation
| Topic | Documentation |
|-------|---------------|
| **Getting Started** | [README.md](../README.md) |
| **System Architecture** | [architecture.md](./architecture.md) |
| **Metrics Calculation** | [modules/metrics.md](./modules/metrics.md) |
| **Database Schema** | [API.md](./API.md#database-schema) |
| **Development Setup** | [CONTRIBUTING.md](./CONTRIBUTING.md) |
| **API Reference** | [API.md](./API.md) |
| Getting Started | See the usage examples in `API.md` |
| System Architecture | `architecture.md` |
| Database Schema | `API.md#input-database-schema-required` |
| Development Setup | Project root `README` and `pyproject.toml` |
## Documentation Standards
## Notes
This documentation follows the project's documentation standards defined in `.cursor/rules/documentation.mdc`. All documentation includes:
- Clear purpose and scope
- Code examples with working implementations
- API documentation with request/response formats
- Error handling and edge cases
- Dependencies and requirements
## Maintenance
Documentation is updated with every significant code change and reviewed during the development process. See [CONTRIBUTING.md](./CONTRIBUTING.md) for details on documentation maintenance procedures.
- Metrics (OBI/CVD), repository/storage layers, and strategy components have been removed from the current codebase and are planned as future enhancements.
- Use UV for package management and running commands. Example: `uv run python main.py ...`.

View File

@@ -2,303 +2,155 @@
## Overview
The Orderflow Backtest System is designed as a modular, high-performance data processing pipeline for cryptocurrency trading analysis. The architecture emphasizes separation of concerns, efficient memory usage, and scalable processing of large datasets.
The current system is a streamlined, high-performance pipeline that streams orderflow from SQLite databases, aggregates trades into OHLC bars, maintains a lightweight depth snapshot, and serves visuals via a Dash web application. Inter-process communication (IPC) between the processor and visualizer uses atomic JSON files for simplicity and robustness.
## High-Level Architecture
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
Data Sources │ Processing │ Presentation
│ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌─────────────┐ │
│ │SQLite Files │─┼────┼→│ Storage │─┼────┼→│ Visualizer
│ │- orderbook │ │ │ │- Orchestrator│ │ │ │- OHLC Charts│
│ │- trades │ │ │ │- Calculator │ │ │ │- OBI/CVD
│ └─────────────┘ │ │ └──────────────┘ │ │ └─────────────┘
└─────────────────┘ │ ┌─────────────┐ │ ┌─────────────┐
│ │ Strategy │──┼────┼→│ Reports │ │
│ │- Analysis │ │ │ │- Metrics │ │
│ │- Alerts │ │ │ │- Summaries │ │
│ └─────────────┘ │ │ └─────────────┘ │
└──────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ ┌─────────────────┐
SQLite Files DB Interpreter │ →OHLC/Depth │ → │ Dash Visualizer
(book,trades) (stream rows) │ │ Processor │ │ (app.py)
└─────────────────┘ └─────────────────────┘ └─────────────────┘ └────────────▲─────┘
│ Atomic JSON (IPC)
ohlc_data.json, depth_data.json
metrics_data.json
Browser UI
```
## Component Architecture
## Components
### Data Layer
### Data Access (`db_interpreter.py`)
#### Models (`models.py`)
**Purpose**: Core data structures and calculation logic
- `OrderbookLevel`: dataclass representing one price level.
- `OrderbookUpdate`: container for a book row window with `bids`, `asks`, `timestamp`, and `end_timestamp`.
- `DBInterpreter`:
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]` streams the book table with lookahead and the trades table in timestamp order.
- Efficient read-only connection with PRAGMA tuning: immutable mode, query_only, temp_store=MEMORY, mmap_size, cache_size.
- Batching constants: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`.
- Each yielded `trades` element is a tuple `(id, trade_id, price, size, side, timestamp_ms)` that falls within `[book.timestamp, next_book.timestamp)`.
```python
# Core data models
OrderbookLevel # Single price level (price, size, order_count, liquidation_count)
Trade # Individual trade execution (price, size, side, timestamp)
BookSnapshot # Complete orderbook state at timestamp
Book # Container for snapshot sequence
Metric # Calculated OBI/CVD values
### Processing (Modular Architecture)
# Calculation engine
MetricCalculator # Static methods for OBI/CVD computation
```
#### Main Coordinator (`ohlc_processor.py`)
- `OHLCProcessor(window_seconds=60, depth_levels_per_side=50)`: Orchestrates trade processing using composition
- `process_trades(trades)`: aggregates trades into OHLC bars and delegates CVD updates
- `update_orderbook(ob_update)`: coordinates orderbook updates and OBI metric calculation
- `finalize()`: finalizes both OHLC bars and metrics data
- `cvd_cumulative` (property): provides access to cumulative volume delta
**Relationships**:
- `Book` contains multiple `BookSnapshot` instances
- `BookSnapshot` contains dictionaries of `OrderbookLevel` and lists of `Trade`
- `Metric` stores calculated values for each `BookSnapshot`
- `MetricCalculator` operates on snapshots to produce metrics
#### Orderbook Management (`orderbook_manager.py`)
- `OrderbookManager`: Handles in-memory orderbook state with partial updates
- Maintains separate bid/ask price→size dictionaries
- Supports deletions via zero-size updates
- Provides sorted top-N level extraction for visualization
#### Repositories (`repositories/`)
**Purpose**: Database access and persistence layer
#### Metrics Calculation (`metrics_calculator.py`)
- `MetricsCalculator`: Manages OBI and CVD metrics with windowed aggregation
- Tracks CVD from trade flow (buy vs sell volume delta)
- Calculates OBI from orderbook volume imbalance
- Provides throttled updates and OHLC-style metric bars
```python
# Repository
SQLiteOrderflowRepository:
- connect() # Optimized SQLite connection
- load_trades_by_timestamp() # Efficient trade loading
- iterate_book_rows() # Memory-efficient snapshot streaming
- count_rows() # Performance monitoring
- create_metrics_table() # Schema creation
- insert_metrics_batch() # High-performance batch inserts
- load_metrics_by_timerange() # Time-range queries
- table_exists() # Schema validation
```
#### Level Parsing (`level_parser.py`)
- Utility functions for normalizing orderbook level data:
- `normalize_levels()`: parses levels, filtering zero/negative sizes
- `parse_levels_including_zeros()`: preserves zeros for deletion operations
- Supports JSON and Python literal formats with robust error handling
**Design Patterns**:
- **Repository Pattern**: Clean separation between data access and business logic
- **Batch Processing**: Process 1000 records per database operation
- **Connection Management**: Caller manages connection lifecycle
- **Performance Optimization**: SQLite PRAGMAs for high-speed operations
### Inter-Process Communication (`viz_io.py`)
### Processing Layer
- File paths (relative to project root):
- `ohlc_data.json`: rolling list of OHLC bars (max 1000).
- `depth_data.json`: latest depth snapshot (bids/asks).
- `metrics_data.json`: rolling list of OBI/TOT OHLC bars (max 1000).
- Atomic writes via temp files prevent partial reads by the Dash app.
- API:
- `add_ohlc_bar(...)`: append a new bar; trim to last 1000.
- `upsert_ohlc_bar(...)`: replace last bar if timestamp matches; else append; trim.
- `clear_data()`: reset OHLC data to an empty list.
#### Storage (`storage.py`)
**Purpose**: Orchestrates data loading, processing, and metrics calculation
### Visualization (`app.py`)
```python
class Storage:
- build_booktick_from_db() # Main processing pipeline
- _create_snapshots_and_metrics() # Per-snapshot processing
- _snapshot_from_row() # Individual snapshot creation
```
- Dash application with two graphs plus OBI subplot:
- OHLC + Volume subplot with shared x-axis.
- OBI candlestick subplot (blue tones) sharing x-axis.
- Depth (cumulative) chart for bids and asks.
- Polling interval (500 ms) callback reads JSON files and updates figures resilently:
- Caches last good values to tolerate in-flight writes/decoding errors.
- Builds figures with Plotly dark theme.
- Exposed on `http://localhost:8050` by default (`host=0.0.0.0`).
**Processing Pipeline**:
1. **Initialize**: Create metrics repository and table if needed
2. **Load Trades**: Group trades by timestamp for efficient access
3. **Stream Processing**: Process snapshots one-by-one to minimize memory
4. **Calculate Metrics**: OBI and CVD calculation per snapshot
5. **Batch Persistence**: Store metrics in batches of 1000
6. **Memory Management**: Discard full snapshots after metric extraction
### CLI Orchestration (`main.py`)
#### Strategy Framework (`strategies.py`)
**Purpose**: Trading analysis and signal generation
```python
class DefaultStrategy:
- set_db_path() # Configure database access
- compute_OBI() # Real-time OBI calculation (fallback)
- load_stored_metrics() # Retrieve persisted metrics
- get_metrics_summary() # Statistical analysis
- on_booktick() # Main analysis entry point
```
**Analysis Capabilities**:
- **Stored Metrics**: Primary analysis using persisted data
- **Real-time Fallback**: Live calculation for compatibility
- **Statistical Summaries**: Min/max/average OBI, CVD changes
- **Alert System**: Configurable thresholds for significant imbalances
### Presentation Layer
#### Visualization (`visualizer.py`)
**Purpose**: Multi-chart rendering and display
```python
class Visualizer:
- set_db_path() # Configure metrics access
- update_from_book() # Main rendering pipeline
- _load_stored_metrics() # Retrieve metrics for chart range
- _draw() # Multi-subplot rendering
- show() # Display interactive charts
```
**Chart Layout**:
```
┌─────────────────────────────────────┐
│ OHLC Candlesticks │ ← Price action
├─────────────────────────────────────┤
│ Volume Bars │ ← Trading volume
├─────────────────────────────────────┤
│ OBI Line Chart │ ← Order book imbalance
├─────────────────────────────────────┤
│ CVD Line Chart │ ← Cumulative volume delta
└─────────────────────────────────────┘
```
**Features**:
- **Shared Time Axis**: Synchronized X-axis across all subplots
- **Auto-scaling**: Y-axis optimization for each metric type
- **Performance**: Efficient rendering of large datasets
- **Interactive**: Qt5Agg backend for zooming and panning
- Typer CLI entrypoint:
- Arguments: `instrument`, `start_date`, `end_date` (UTC, `YYYY-MM-DD`), options: `--window-seconds`.
- Discovers SQLite files under `../data/OKX` matching the instrument.
- Launches Dash visualizer as a separate process: `uv run python app.py`.
- Streams databases sequentially: for each book row, processes trades and updates orderbook.
## Data Flow
### Processing Flow
```
1. SQLite DB → Repository → Raw Data
2. Raw Data → Storage → BookSnapshot
3. BookSnapshot → MetricCalculator → OBI/CVD
4. Metrics → Repository → Database Storage
5. Stored Metrics → Strategy → Analysis
6. Stored Metrics → Visualizer → Charts
```
1. Discover and open SQLite database(s) for the requested instrument.
2. Stream `book` rows with one-row lookahead to form time windows.
3. Stream `trades` in timestamp order and bucket into the active window.
4. For each window:
- Aggregate trades into OHLC using `OHLCProcessor.process_trades`.
- Apply partial depth updates via `OHLCProcessor.update_orderbook` and emit periodic snapshots.
5. Persist current OHLC bar(s) and depth snapshots to JSON via atomic writes.
6. Dash app polls JSON and renders charts.
### Memory Management Flow
```
Traditional: DB → All Snapshots in Memory → Analysis (High Memory)
Optimized: DB → Process Snapshot → Calculate Metrics → Store → Discard (Low Memory)
```
## IPC JSON Schemas
## Database Schema
- OHLC (`ohlc_data.json`): array of bars; each bar is `[ts, open, high, low, close, volume]`.
### Input Schema (Required)
```sql
-- Orderbook snapshots
CREATE TABLE book (
id INTEGER PRIMARY KEY,
instrument TEXT,
bids TEXT, -- JSON: [[price, size, liq_count, order_count], ...]
asks TEXT, -- JSON: [[price, size, liq_count, order_count], ...]
timestamp TEXT
);
- Depth (`depth_data.json`): object with bids/asks arrays: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`.
-- Trade executions
CREATE TABLE trades (
id INTEGER PRIMARY KEY,
instrument TEXT,
trade_id TEXT,
price REAL,
size REAL,
side TEXT, -- "buy" or "sell"
timestamp TEXT
);
```
- Metrics (`metrics_data.json`): array of bars; each bar is `[ts, obi_open, obi_high, obi_low, obi_close, tot_open, tot_high, tot_low, tot_close]`.
### Output Schema (Auto-created)
```sql
-- Calculated metrics
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER,
timestamp TEXT,
obi REAL, -- Order Book Imbalance [-1, 1]
cvd REAL, -- Cumulative Volume Delta
best_bid REAL,
best_ask REAL,
FOREIGN KEY (snapshot_id) REFERENCES book(id)
);
## Configuration
-- Performance indexes
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
```
- `OHLCProcessor(window_seconds, depth_levels_per_side)` controls aggregation granularity and depth snapshot size.
- Visualizer interval (`500 ms`) balances UI responsiveness and CPU usage.
- Paths: JSON files (`ohlc_data.json`, `depth_data.json`) are colocated with the code and written atomically.
- CLI parameters select instrument and time range; databases expected under `../data/OKX`.
## Performance Characteristics
### Memory Optimization
- **Before**: Store all snapshots in memory (~1GB for 600K snapshots)
- **After**: Store only metrics data (~300MB for same dataset)
- **Reduction**: >70% memory usage decrease
- Read-only SQLite tuned for fast sequential scans: immutable URI, query_only, large mmap and cache.
- Batching minimizes cursor churn and Python overhead.
- JSON IPC uses atomic replace to avoid contention; OHLC list is bounded to 1000 entries.
- Processor throttles intra-window OHLC upserts and depth emissions to reduce I/O.
### Processing Performance
- **Batch Size**: 1000 records per database operation
- **Processing Speed**: ~1000 snapshots/second on modern hardware
- **Database Overhead**: <20% storage increase for metrics table
- **Query Performance**: Sub-second retrieval for typical time ranges
## Error Handling
### Scalability Limits
- **Single File**: 1M+ snapshots per database file
- **Time Range**: Months to years of historical data
- **Memory Peak**: <2GB for year-long datasets
- **Disk Space**: Original size + 20% for metrics
## Integration Points
### External Interfaces
```python
# Main application entry point
main.py:
- CLI argument parsing
- Database file discovery
- Component orchestration
- Progress monitoring
# Plugin interfaces
Strategy.on_booktick(book: Book) # Strategy integration point
Visualizer.update_from_book(book) # Visualization integration
```
### Internal Interfaces
```python
# Repository interfaces
Repository.connect() Connection
Repository.load_data() TypedData
Repository.store_data(data) None
# Calculator interfaces
MetricCalculator.calculate_obi(snapshot) float
MetricCalculator.calculate_cvd(prev_cvd, trades) float
```
- Visualizer tolerates JSON decode races by reusing last good values and logging warnings.
- Processor guards depth parsing and writes; logs at debug/info levels.
- Visualizer startup is wrapped; if it fails, processing continues without UI.
## Security Considerations
### Data Protection
- **SQL Injection**: All queries use parameterized statements
- **File Access**: Validates database file paths and permissions
- **Error Handling**: No sensitive data in error messages
- **Input Validation**: Sanitizes all external inputs
- SQLite connections are read-only and immutable; no write queries executed.
- File writes are confined to project directory; no paths derived from untrusted input.
- Logs avoid sensitive data; only operational metadata.
### Access Control
- **Database**: Respects file system permissions
- **Memory**: No sensitive data persistence beyond processing
- **Logging**: Configurable log levels without data exposure
## Testing Guidance
## Configuration Management
- Unit tests (run with `uv run pytest`):
- `OHLCProcessor`: window boundary handling, high/low tracking, volume accumulation, upsert behavior.
- Depth maintenance: deletions (size==0), top-N sorting, throttling.
- `DBInterpreter.stream`: correct trade-window assignment, end-of-stream handling.
- Integration: end-to-end generation of JSON from a tiny fixture DB and basic figure construction without launching a server.
### Performance Tuning
```python
# Storage configuration
BATCH_SIZE = 1000 # Records per database operation
LOG_FREQUENCY = 20 # Progress reports per processing run
## Roadmap (Optional Enhancements)
# SQLite optimization
PRAGMA journal_mode = OFF # Maximum write performance
PRAGMA synchronous = OFF # Disable synchronous writes
PRAGMA cache_size = 100000 # Large memory cache
```
### Visualization Settings
```python
# Chart configuration
WINDOW_SECONDS = 60 # OHLC aggregation window
MAX_BARS = 500 # Maximum bars displayed
FIGURE_SIZE = (12, 10) # Chart dimensions
```
## Error Handling Strategy
### Graceful Degradation
- **Database Errors**: Continue with reduced functionality
- **Calculation Errors**: Skip problematic snapshots with logging
- **Visualization Errors**: Display available data, note issues
- **Memory Pressure**: Adjust batch sizes automatically
### Recovery Mechanisms
- **Partial Processing**: Resume from last successful batch
- **Data Validation**: Verify metrics calculations before storage
- **Rollback Support**: Transaction boundaries for data consistency
- Metrics: add OBI/CVD computation and persist metrics to a dedicated table.
- Repository Pattern: extract DB access into a repository module with typed methods.
- Orchestrator: introduce a `Storage` pipeline module coordinating batch processing and persistence.
- Strategy Layer: compute signals/alerts on stored metrics.
- Visualization: add OBI/CVD subplots and richer interactions.
---
This architecture provides a robust, scalable foundation for high-frequency trading data analysis while maintaining clean separation of concerns and efficient resource utilization.
This document reflects the current implementation centered on SQLite streaming, JSON-based IPC, and a Dash visualizer, providing a clear foundation for incremental enhancements.

View File

@@ -1,120 +0,0 @@
# ADR-001: Persistent Metrics Storage
## Status
Accepted
## Context
The original orderflow backtest system kept all orderbook snapshots in memory during processing, leading to excessive memory usage (>1GB for typical datasets). With the addition of OBI and CVD metrics calculation, we needed to decide how to handle the computed metrics and manage memory efficiently.
## Decision
We will implement persistent storage of calculated metrics in the SQLite database with the following approach:
1. **Metrics Table**: Create a dedicated `metrics` table to store OBI, CVD, and related data
2. **Streaming Processing**: Process snapshots one-by-one, calculate metrics, store results, then discard snapshots
3. **Batch Operations**: Use batch inserts (1000 records) for optimal database performance
4. **Query Interface**: Provide time-range queries for metrics retrieval and analysis
## Consequences
### Positive
- **Memory Reduction**: >70% reduction in peak memory usage during processing
- **Avoid Recalculation**: Metrics calculated once and reused for multiple analysis runs
- **Scalability**: Can process months/years of data without memory constraints
- **Performance**: Batch database operations provide high throughput
- **Persistence**: Metrics survive between application runs
- **Analysis Ready**: Stored metrics enable complex time-series analysis
### Negative
- **Storage Overhead**: Metrics table adds ~20% to database size
- **Complexity**: Additional database schema and management code
- **Dependencies**: Tighter coupling between processing and database layer
- **Migration**: Existing databases need schema updates for metrics table
## Alternatives Considered
### Option 1: Keep All Snapshots in Memory
**Rejected**: Unsustainable memory usage for large datasets. Would limit analysis to small time ranges.
### Option 2: Calculate Metrics On-Demand
**Rejected**: Recalculating metrics for every analysis run is computationally expensive and time-consuming.
### Option 3: External Metrics Database
**Rejected**: Adds deployment complexity. SQLite co-location provides better performance and simpler management.
### Option 4: Compressed In-Memory Cache
**Rejected**: Still faces fundamental memory scaling issues. Compression/decompression adds CPU overhead.
## Implementation Details
### Database Schema
```sql
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER NOT NULL,
timestamp TEXT NOT NULL,
obi REAL NOT NULL,
cvd REAL NOT NULL,
best_bid REAL,
best_ask REAL,
FOREIGN KEY (snapshot_id) REFERENCES book(id)
);
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
```
### Processing Pipeline
1. Create metrics table if not exists
2. Stream through orderbook snapshots
3. For each snapshot:
- Calculate OBI and CVD metrics
- Batch store metrics (1000 records per commit)
- Discard snapshot from memory
4. Provide query interface for time-range retrieval
### Memory Management
- **Before**: Store all snapshots → Calculate on demand → High memory usage
- **After**: Stream snapshots → Calculate immediately → Store metrics → Low memory usage
## Migration Strategy
### Backward Compatibility
- Existing databases continue to work without metrics table
- System auto-creates metrics table on first processing run
- Fallback to real-time calculation if metrics unavailable
### Performance Impact
- **Processing Time**: Slight increase due to database writes (~10%)
- **Query Performance**: Significant improvement for repeated analysis
- **Overall**: Net positive performance for typical usage patterns
## Monitoring and Validation
### Success Metrics
- **Memory Usage**: Target >70% reduction in peak memory usage
- **Processing Speed**: Maintain >500 snapshots/second processing rate
- **Storage Efficiency**: Metrics table <25% of total database size
- **Query Performance**: <1 second retrieval for typical time ranges
### Validation Methods
- Memory profiling during large dataset processing
- Performance benchmarks vs. original system
- Storage overhead analysis across different dataset sizes
- Query performance testing with various time ranges
## Future Considerations
### Potential Enhancements
- **Compression**: Consider compression for metrics storage if overhead becomes significant
- **Partitioning**: Time-based partitioning for very large datasets
- **Caching**: In-memory cache for frequently accessed metrics
- **Export**: Direct export capabilities for external analysis tools
### Scalability Options
- **Database Upgrade**: PostgreSQL if SQLite becomes limiting factor
- **Parallel Processing**: Multi-threaded metrics calculation
- **Distributed Storage**: For institutional-scale datasets
---
This decision provides a solid foundation for efficient, scalable metrics processing while maintaining simplicity and performance characteristics suitable for the target use cases.

View File

@@ -0,0 +1,122 @@
# ADR-001: SQLite Database Choice
## Status
Accepted
## Context
The orderflow backtest system needs to efficiently store and stream large volumes of historical orderbook and trade data. Key requirements include:
- Fast sequential read access for time-series data
- Minimal setup and maintenance overhead
- Support for concurrent reads from visualization layer
- Ability to handle databases ranging from 100MB to 10GB+
- No network dependencies for data access
## Decision
We will use SQLite as the primary database for storing historical orderbook and trade data.
## Consequences
### Positive
- **Zero configuration**: No database server setup or administration required
- **Excellent read performance**: Optimized for sequential scans with proper PRAGMA settings
- **Built-in Python support**: No external dependencies or connection libraries needed
- **File portability**: Database files can be easily shared and archived
- **ACID compliance**: Ensures data integrity during writes (for data ingestion)
- **Small footprint**: Minimal memory and storage overhead
- **Fast startup**: No connection pooling or server initialization delays
### Negative
- **Single writer limitation**: Cannot handle concurrent writes (acceptable for read-only backtest)
- **Limited scalability**: Not suitable for high-concurrency production trading systems
- **No network access**: Cannot query databases remotely (acceptable for local analysis)
- **File locking**: Potential issues with file system sharing (mitigated by read-only access)
## Implementation Details
### Schema Design
```sql
-- Orderbook snapshots with timestamp windows
CREATE TABLE book (
id INTEGER PRIMARY KEY,
instrument TEXT,
bids TEXT NOT NULL, -- JSON array of [price, size] pairs
asks TEXT NOT NULL, -- JSON array of [price, size] pairs
timestamp TEXT NOT NULL
);
-- Individual trade records
CREATE TABLE trades (
id INTEGER PRIMARY KEY,
instrument TEXT,
trade_id TEXT,
price REAL NOT NULL,
size REAL NOT NULL,
side TEXT NOT NULL, -- "buy" or "sell"
timestamp TEXT NOT NULL
);
-- Indexes for efficient time-based queries
CREATE INDEX idx_book_timestamp ON book(timestamp);
CREATE INDEX idx_trades_timestamp ON trades(timestamp);
```
### Performance Optimizations
```python
# Read-only connection with optimized PRAGMA settings
connection_uri = f"file:{db_path}?immutable=1&mode=ro"
conn = sqlite3.connect(connection_uri, uri=True)
conn.execute("PRAGMA query_only = 1")
conn.execute("PRAGMA temp_store = MEMORY")
conn.execute("PRAGMA mmap_size = 268435456") # 256MB
conn.execute("PRAGMA cache_size = 10000")
```
## Alternatives Considered
### PostgreSQL
- **Rejected**: Requires server setup and maintenance
- **Pros**: Better concurrent access, richer query features
- **Cons**: Overkill for read-only use case, deployment complexity
### Parquet Files
- **Rejected**: Limited query capabilities for time-series data
- **Pros**: Excellent compression, columnar format
- **Cons**: No indexes, complex range queries, requires additional libraries
### MongoDB
- **Rejected**: Document structure not optimal for time-series data
- **Pros**: Flexible schema, good aggregation pipeline
- **Cons**: Requires server, higher memory usage, learning curve
### CSV Files
- **Rejected**: Poor query performance for large datasets
- **Pros**: Simple format, universal compatibility
- **Cons**: No indexing, slow filtering, type conversion overhead
### InfluxDB
- **Rejected**: Overkill for historical data analysis
- **Pros**: Optimized for time-series, good compression
- **Cons**: Additional service dependency, learning curve
## Migration Path
If scalability becomes an issue in the future:
1. **Phase 1**: Implement database abstraction layer in `db_interpreter`
2. **Phase 2**: Add PostgreSQL adapter for production workloads
3. **Phase 3**: Implement data partitioning for very large datasets
4. **Phase 4**: Consider distributed storage for multi-terabyte datasets
## Monitoring
Track the following metrics to validate this decision:
- Database file sizes and growth rates
- Query performance for different date ranges
- Memory usage during streaming operations
- Time to process complete backtests
## Review Date
This decision should be reviewed if:
- Database files consistently exceed 50GB
- Query performance degrades below 1000 rows/second
- Concurrent access requirements change
- Network-based data sharing becomes necessary

View File

@@ -0,0 +1,162 @@
# ADR-002: JSON File-Based Inter-Process Communication
## Status
Accepted
## Context
The orderflow backtest system requires communication between the data processing pipeline and the web-based visualization frontend. Key requirements include:
- Real-time data updates from processor to visualization
- Tolerance for timing mismatches between writer and reader
- Simple implementation without external dependencies
- Support for different update frequencies (OHLC bars vs. orderbook depth)
- Graceful handling of process crashes or restarts
## Decision
We will use JSON files with atomic write operations for inter-process communication between the data processor and Dash visualization frontend.
## Consequences
### Positive
- **Simplicity**: No message queues, sockets, or complex protocols
- **Fault tolerance**: File-based communication survives process restarts
- **Debugging friendly**: Data files can be inspected manually
- **No dependencies**: Built-in JSON support, no external libraries
- **Atomic operations**: Temp file + rename prevents partial reads
- **Language agnostic**: Any process can read/write JSON files
- **Bounded memory**: Rolling data windows prevent unlimited growth
### Negative
- **File I/O overhead**: Disk writes may be slower than in-memory communication
- **Polling required**: Reader must poll for updates (500ms interval)
- **Limited throughput**: Not suitable for high-frequency (microsecond) updates
- **No acknowledgments**: Writer cannot confirm reader has processed data
- **File system dependency**: Performance varies by storage type
## Implementation Details
### File Structure
```
ohlc_data.json # Rolling array of OHLC bars (max 1000)
depth_data.json # Current orderbook depth snapshot
metrics_data.json # Rolling array of OBI/CVD metrics (max 1000)
```
### Atomic Write Pattern
```python
def atomic_write(file_path: Path, data: Any) -> None:
"""Write data atomically to prevent partial reads."""
temp_path = file_path.with_suffix('.tmp')
with open(temp_path, 'w') as f:
json.dump(data, f)
f.flush()
os.fsync(f.fileno())
temp_path.replace(file_path) # Atomic on POSIX systems
```
### Data Formats
```python
# OHLC format: [timestamp_ms, open, high, low, close, volume]
ohlc_data = [
[1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
[1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
]
# Depth format: top-N levels per side
depth_data = {
"bids": [[49990.0, 1.5], [49985.0, 2.1]],
"asks": [[50010.0, 1.2], [50015.0, 1.8]]
}
# Metrics format: [timestamp_ms, obi_open, obi_high, obi_low, obi_close]
metrics_data = [
[1640995200000, 0.15, 0.22, 0.08, 0.18],
[1640995260000, 0.18, 0.25, 0.12, 0.20]
]
```
### Error Handling
```python
# Reader pattern with graceful fallback
try:
with open(data_file) as f:
new_data = json.load(f)
_LAST_DATA = new_data # Cache successful read
except (FileNotFoundError, json.JSONDecodeError) as e:
logging.warning(f"Using cached data: {e}")
new_data = _LAST_DATA # Use cached data
```
## Performance Characteristics
### Write Performance
- **Small files**: < 1MB typical, writes complete in < 10ms
- **Atomic operations**: Add ~2-5ms overhead for temp file creation
- **Throttling**: Updates limited to prevent excessive I/O
### Read Performance
- **Parse time**: < 5ms for typical JSON file sizes
- **Polling overhead**: 500ms interval balances responsiveness and CPU usage
- **Error recovery**: Cached data eliminates visual glitches
### Memory Usage
- **Bounded datasets**: Max 1000 bars × 6 fields × 8 bytes = ~48KB per file
- **JSON overhead**: ~2x memory during parsing
- **Total footprint**: < 500KB for all IPC data
## Alternatives Considered
### Redis Pub/Sub
- **Rejected**: Additional service dependency, overkill for simple use case
- **Pros**: True real-time updates, built-in data structures
- **Cons**: External dependency, memory overhead, configuration complexity
### ZeroMQ
- **Rejected**: Additional library dependency, more complex than needed
- **Pros**: High performance, flexible patterns
- **Cons**: Learning curve, binary dependency, networking complexity
### Named Pipes/Unix Sockets
- **Rejected**: Platform-specific, more complex error handling
- **Pros**: Better performance, no file I/O
- **Cons**: Platform limitations, harder debugging, process lifetime coupling
### SQLite as Message Queue
- **Rejected**: Overkill for simple data exchange
- **Pros**: ACID transactions, complex queries possible
- **Cons**: Schema management, locking considerations, overhead
### HTTP API
- **Rejected**: Too much overhead for local communication
- **Pros**: Standard protocol, language agnostic
- **Cons**: Network stack overhead, port management, authentication
## Future Considerations
### Scalability Limits
Current approach suitable for:
- Update frequencies: 1-10 Hz
- Data volumes: < 10MB total
- Process counts: 1 writer, few readers
### Migration Path
If performance becomes insufficient:
1. **Phase 1**: Add compression (gzip) to reduce I/O
2. **Phase 2**: Implement shared memory for high-frequency data
3. **Phase 3**: Consider message queue for complex routing
4. **Phase 4**: Migrate to streaming protocol for real-time requirements
## Monitoring
Track these metrics to validate the approach:
- File write latency and frequency
- JSON parse times in visualization
- Error rates for partial reads
- Memory usage growth over time
## Review Triggers
Reconsider this decision if:
- Update frequency requirements exceed 10 Hz
- File I/O becomes a performance bottleneck
- Multiple visualization clients need the same data
- Complex message routing becomes necessary
- Platform portability becomes a concern

View File

@@ -1,217 +0,0 @@
# ADR-002: Separation of Visualization from Strategy
## Status
Accepted
## Context
The original system embedded visualization functionality within the `DefaultStrategy` class, creating tight coupling between trading analysis logic and chart rendering. This design had several issues:
1. **Mixed Responsibilities**: Strategy classes handled both trading logic and GUI operations
2. **Testing Complexity**: Strategy tests required mocking GUI components
3. **Deployment Flexibility**: Strategies couldn't run in headless environments
4. **Timing Control**: Visualization timing was tied to strategy execution rather than application flow
The user specifically requested to display visualizations after processing each database file, requiring better control over visualization timing.
## Decision
We will separate visualization from strategy components with the following architecture:
1. **Remove Visualization from Strategy**: Strategy classes focus solely on trading analysis
2. **Main Application Control**: `main.py` orchestrates visualization timing and updates
3. **Independent Configuration**: Strategy and Visualizer get database paths independently
4. **Clean Interfaces**: No direct dependencies between strategy and visualization components
## Consequences
### Positive
- **Single Responsibility**: Strategy focuses on trading logic, Visualizer on charts
- **Better Testability**: Strategy tests run without GUI dependencies
- **Flexible Deployment**: Strategies can run in headless/server environments
- **Timing Control**: Visualization updates precisely when needed (after each DB)
- **Maintainability**: Changes to visualization don't affect strategy logic
- **Performance**: No GUI overhead during strategy analysis
### Negative
- **Increased Complexity**: Main application handles more orchestration logic
- **Coordination Required**: Must ensure strategy and visualizer get same database path
- **Breaking Change**: Existing strategy initialization code needs updates
## Alternatives Considered
### Option 1: Keep Visualization in Strategy
**Rejected**: Violates single responsibility principle. Makes testing difficult and deployment inflexible.
### Option 2: Observer Pattern
**Rejected**: Adds unnecessary complexity for this use case. Direct control in main.py is simpler and more explicit.
### Option 3: Visualization Service
**Rejected**: Over-engineering for current requirements. May be considered for future multi-strategy scenarios.
## Implementation Details
### Before (Coupled Design)
```python
class DefaultStrategy:
def __init__(self, instrument: str, enable_visualization: bool = True):
self.visualizer = Visualizer(...) if enable_visualization else None
def on_booktick(self, book: Book):
# Trading analysis
# ...
# Visualization update
if self.visualizer:
self.visualizer.update_from_book(book)
```
### After (Separated Design)
```python
# Strategy focuses on analysis only
class DefaultStrategy:
def __init__(self, instrument: str):
# No visualization dependencies
def on_booktick(self, book: Book):
# Pure trading analysis
# No visualization code
# Main application orchestrates both
def main():
strategy = DefaultStrategy(instrument)
visualizer = Visualizer(...)
for db_path in db_paths:
strategy.set_db_path(db_path)
visualizer.set_db_path(db_path)
# Process data
storage.build_booktick_from_db(db_path, db_date)
# Analysis
strategy.on_booktick(storage.book)
# Visualization (controlled timing)
visualizer.update_from_book(storage.book)
# Final display
visualizer.show()
```
### Interface Changes
#### Strategy Interface (Simplified)
```python
class DefaultStrategy:
def __init__(self, instrument: str) # Removed visualization param
def set_db_path(self, db_path: Path) -> None # No visualizer.set_db_path()
def on_booktick(self, book: Book) -> None # No visualization calls
```
#### Main Application (Enhanced)
```python
def main():
# Separate initialization
strategy = DefaultStrategy(instrument)
visualizer = Visualizer(window_seconds=60, max_bars=500)
# Independent configuration
for db_path in db_paths:
strategy.set_db_path(db_path)
visualizer.set_db_path(db_path)
# Controlled execution
strategy.on_booktick(storage.book) # Analysis
visualizer.update_from_book(storage.book) # Visualization
```
## Migration Strategy
### Code Changes Required
1. **Strategy Classes**: Remove visualization initialization and calls
2. **Main Application**: Add visualizer creation and orchestration
3. **Tests**: Update strategy tests to remove visualization mocking
4. **Configuration**: Remove visualization parameters from strategy constructors
### Backward Compatibility
- **API Breaking**: Strategy constructor signature changes
- **Functionality Preserved**: All visualization features remain available
- **Test Updates**: Strategy tests become simpler (no GUI mocking needed)
### Migration Steps
1. Update `DefaultStrategy` to remove visualization dependencies
2. Modify `main.py` to create and manage `Visualizer` instance
3. Update all strategy constructor calls to remove `enable_visualization`
4. Update tests to reflect new interfaces
5. Verify visualization timing meets requirements
## Benefits Achieved
### Clean Architecture
- **Strategy**: Pure trading analysis logic
- **Visualizer**: Pure chart rendering logic
- **Main**: Application flow and component coordination
### Improved Testing
```python
# Before: Complex mocking required
def test_strategy():
with patch('visualizer.Visualizer') as mock_viz:
strategy = DefaultStrategy("BTC", enable_visualization=True)
# Complex mock setup...
# After: Simple, direct testing
def test_strategy():
strategy = DefaultStrategy("BTC")
# Direct testing of analysis logic
```
### Flexible Deployment
```python
# Headless server deployment
strategy = DefaultStrategy("BTC")
# No GUI dependencies, can run anywhere
# Development with visualization
strategy = DefaultStrategy("BTC")
visualizer = Visualizer(...)
# Full GUI functionality when needed
```
### Precise Timing Control
```python
# Visualization updates exactly when requested
for db_file in database_files:
process_database(db_file) # Data processing
strategy.analyze(book) # Trading analysis
visualizer.update_from_book(book) # Chart update after each DB
```
## Monitoring and Validation
### Success Criteria
- **Test Simplification**: Strategy tests run without GUI mocking
- **Timing Accuracy**: Visualization updates after each database as requested
- **Performance**: No GUI overhead during pure analysis operations
- **Maintainability**: Visualization changes don't affect strategy code
### Validation Methods
- Run strategy tests in headless environment
- Verify visualization timing matches requirements
- Performance comparison of analysis-only vs. GUI operations
- Code complexity metrics for strategy vs. visualization modules
## Future Considerations
### Potential Enhancements
- **Multiple Visualizers**: Support different chart types or windows
- **Visualization Plugins**: Pluggable chart renderers for different outputs
- **Remote Visualization**: Web-based charts for server deployments
- **Batch Visualization**: Process multiple databases before chart updates
### Extensibility
- **Strategy Plugins**: Easy to add strategies without visualization concerns
- **Visualization Backends**: Swap chart libraries without affecting strategies
- **Analysis Pipeline**: Clear separation enables complex analysis workflows
---
This separation provides a clean, maintainable architecture that supports the requested visualization timing while improving code quality and testability.

View File

@@ -0,0 +1,204 @@
# ADR-003: Dash Web Framework for Visualization
## Status
Accepted
## Context
The orderflow backtest system requires a user interface for visualizing OHLC candlestick charts, volume data, orderbook depth, and derived metrics. Key requirements include:
- Real-time chart updates with minimal latency
- Professional financial data visualization capabilities
- Support for multiple chart types (candlesticks, bars, line charts)
- Interactive features (zooming, panning, hover details)
- Dark theme suitable for trading applications
- Python-native solution to avoid JavaScript development
## Decision
We will use Dash (by Plotly) as the web framework for building the visualization frontend, with Plotly.js for chart rendering.
## Consequences
### Positive
- **Python-native**: No JavaScript development required
- **Plotly integration**: Best-in-class financial charting capabilities
- **Reactive architecture**: Automatic UI updates via callback system
- **Professional appearance**: High-quality charts suitable for trading applications
- **Interactive features**: Built-in zooming, panning, hover tooltips
- **Responsive design**: Bootstrap integration for modern layouts
- **Development speed**: Rapid prototyping and iteration
- **WebGL acceleration**: Smooth performance for large datasets
### Negative
- **Performance overhead**: Heavier than custom JavaScript solutions
- **Limited customization**: Constrained by Dash component ecosystem
- **Single-page limitation**: Not suitable for complex multi-page applications
- **Memory usage**: Can be heavy for resource-constrained environments
- **Learning curve**: Callback patterns require understanding of reactive programming
## Implementation Details
### Application Structure
```python
# Main application with Bootstrap theme
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.FLATLY])
# Responsive layout with 9:3 ratio for charts:depth
app.layout = dbc.Container([
dbc.Row([
dbc.Col([ # OHLC + Volume + Metrics
dcc.Graph(id='ohlc-chart', style={'height': '100vh'})
], width=9),
dbc.Col([ # Orderbook Depth
dcc.Graph(id='depth-chart', style={'height': '100vh'})
], width=3)
]),
dcc.Interval(id='interval-update', interval=500, n_intervals=0)
])
```
### Chart Architecture
```python
# Multi-subplot chart with shared x-axis
fig = make_subplots(
rows=3, cols=1,
row_heights=[0.6, 0.2, 0.2], # OHLC, Volume, Metrics
vertical_spacing=0.02,
shared_xaxes=True,
subplot_titles=['Price', 'Volume', 'OBI Metrics']
)
# Candlestick chart with dark theme
fig.add_trace(go.Candlestick(
x=timestamps, open=opens, high=highs, low=lows, close=closes,
increasing_line_color='#00ff00', decreasing_line_color='#ff0000'
), row=1, col=1)
```
### Real-time Updates
```python
@app.callback(
[Output('ohlc-chart', 'figure'), Output('depth-chart', 'figure')],
[Input('interval-update', 'n_intervals')]
)
def update_charts(n_intervals):
# Read data from JSON files with error handling
# Build and return updated figures
return ohlc_fig, depth_fig
```
## Performance Characteristics
### Update Latency
- **Polling interval**: 500ms for near real-time updates
- **Chart render time**: 50-200ms depending on data size
- **Memory usage**: ~100MB for typical chart configurations
- **Browser requirements**: Modern browser with WebGL support
### Scalability Limits
- **Data points**: Up to 10,000 candlesticks without performance issues
- **Update frequency**: Optimal at 1-2 Hz, maximum ~10 Hz
- **Concurrent users**: Single user design (development server)
- **Memory growth**: Linear with data history size
## Alternatives Considered
### Streamlit
- **Rejected**: Less interactive, slower updates, limited charting
- **Pros**: Simpler programming model, good for prototypes
- **Cons**: Poor real-time performance, limited financial chart types
### Flask + Custom JavaScript
- **Rejected**: Requires JavaScript development, more complex
- **Pros**: Complete control, potentially better performance
- **Cons**: Significant development overhead, maintenance burden
### Jupyter Notebooks
- **Rejected**: Not suitable for production deployment
- **Pros**: Great for exploration and analysis
- **Cons**: No real-time updates, not web-deployable
### Bokeh
- **Rejected**: Less mature ecosystem, fewer financial chart types
- **Pros**: Good performance, Python-native
- **Cons**: Smaller community, limited examples for financial data
### Custom React Application
- **Rejected**: Requires separate frontend team, complex deployment
- **Pros**: Maximum flexibility, best performance potential
- **Cons**: High development cost, maintenance overhead
### Desktop GUI (Tkinter/PyQt)
- **Rejected**: Not web-accessible, limited styling options
- **Pros**: No browser dependency, good performance
- **Cons**: Deployment complexity, poor mobile support
## Configuration Options
### Theme and Styling
```python
# Dark theme configuration
dark_theme = {
'plot_bgcolor': '#000000',
'paper_bgcolor': '#000000',
'font_color': '#ffffff',
'grid_color': '#333333'
}
```
### Chart Types
- **Candlestick charts**: OHLC price data with volume
- **Bar charts**: Volume and metrics visualization
- **Line charts**: Cumulative depth and trend analysis
- **Scatter plots**: Trade-by-trade analysis (future)
### Interactive Features
- **Zoom and pan**: Time-based navigation
- **Hover tooltips**: Detailed data on mouse over
- **Crosshairs**: Precise value reading
- **Range selector**: Quick time period selection
## Future Enhancements
### Short-term (1-3 months)
- Add range selector for time navigation
- Implement chart annotation for significant events
- Add export functionality for charts and data
### Medium-term (3-6 months)
- Multi-instrument support with tabs
- Advanced indicators and overlays
- User preference persistence
### Long-term (6+ months)
- Real-time alerts and notifications
- Strategy backtesting visualization
- Portfolio-level analytics
## Monitoring and Metrics
### Performance Monitoring
- Chart render times and update frequencies
- Memory usage growth over time
- Browser compatibility and error rates
- User interaction patterns
### Quality Metrics
- Chart accuracy compared to source data
- Visual responsiveness during heavy updates
- Error recovery from data corruption
## Review Triggers
Reconsider this decision if:
- Update frequency requirements exceed 10 Hz consistently
- Memory usage becomes prohibitive (> 1GB)
- Custom visualization requirements cannot be met
- Multi-user deployment becomes necessary
- Mobile responsiveness becomes a priority
- Integration with external charting libraries is needed
## Migration Path
If replacement becomes necessary:
1. **Phase 1**: Abstract chart building logic from Dash specifics
2. **Phase 2**: Implement alternative frontend while maintaining data formats
3. **Phase 3**: A/B test performance and usability
4. **Phase 4**: Complete migration with feature parity

165
docs/modules/app.md Normal file
View File

@@ -0,0 +1,165 @@
# Module: app
## Purpose
The `app` module provides a real-time Dash web application for visualizing OHLC candlestick charts, volume data, Order Book Imbalance (OBI) metrics, and orderbook depth. It implements a polling-based architecture that reads JSON data files and renders interactive charts with a dark theme.
## Public Interface
### Functions
- `build_empty_ohlc_fig() -> go.Figure`: Create empty OHLC chart with proper styling
- `build_empty_depth_fig() -> go.Figure`: Create empty depth chart with proper styling
- `build_ohlc_fig(data: List[list], metrics: List[list]) -> go.Figure`: Build complete OHLC+Volume+OBI chart
- `build_depth_fig(depth_data: dict) -> go.Figure`: Build orderbook depth visualization
### Global Variables
- `_LAST_DATA`: Cached OHLC data for error recovery
- `_LAST_DEPTH`: Cached depth data for error recovery
- `_LAST_METRICS`: Cached metrics data for error recovery
### Dash Application
- `app`: Main Dash application instance with Bootstrap theme
- Layout with responsive grid (9:3 ratio for OHLC:Depth charts)
- 500ms polling interval for real-time updates
## Usage Examples
### Running the Application
```bash
# Start the Dash server
uv run python app.py
# Access the web interface
# Open http://localhost:8050 in your browser
```
### Programmatic Usage
```python
from app import build_ohlc_fig, build_depth_fig
# Build charts with sample data
ohlc_data = [[1640995200000, 50000, 50100, 49900, 50050, 125.5]]
metrics_data = [[1640995200000, 0.15, 0.22, 0.08, 0.18]]
depth_data = {
"bids": [[49990, 1.5], [49985, 2.1]],
"asks": [[50010, 1.2], [50015, 1.8]]
}
ohlc_fig = build_ohlc_fig(ohlc_data, metrics_data)
depth_fig = build_depth_fig(depth_data)
```
## Dependencies
### Internal
- `viz_io`: Data file paths and JSON reading
- `viz_io.DATA_FILE`: OHLC data source
- `viz_io.DEPTH_FILE`: Depth data source
- `viz_io.METRICS_FILE`: Metrics data source
### External
- `dash`: Web application framework
- `dash.html`, `dash.dcc`: HTML and core components
- `dash_bootstrap_components`: Bootstrap styling
- `plotly.graph_objs`: Chart objects
- `plotly.subplots`: Multiple subplot support
- `pandas`: Data manipulation (minimal usage)
- `json`: JSON file parsing
- `logging`: Error and debug logging
- `pathlib`: File path handling
## Chart Architecture
### OHLC Chart (Left Panel, 9/12 width)
- **Main subplot**: Candlestick chart with OHLC data
- **Volume subplot**: Bar chart sharing x-axis with main chart
- **OBI subplot**: Order Book Imbalance candlestick chart in blue tones
- **Shared x-axis**: Synchronized zooming and panning across subplots
### Depth Chart (Right Panel, 3/12 width)
- **Cumulative depth**: Stepped line chart showing bid/ask liquidity
- **Color coding**: Green for bids, red for asks
- **Real-time updates**: Reflects current orderbook state
## Styling and Theme
### Dark Theme Configuration
- Background: Black (`#000000`)
- Text: White (`#ffffff`)
- Grid: Dark gray with transparency
- Candlesticks: Green (up) / Red (down)
- Volume: Gray bars
- OBI: Blue tones for candlesticks
- Depth: Green (bids) / Red (asks)
### Responsive Design
- Bootstrap grid system for layout
- Fluid container for full-width usage
- 100vh height for full viewport coverage
- Configurable chart display modes
## Data Polling and Error Handling
### Polling Strategy
- **Interval**: 500ms for near real-time updates
- **Graceful degradation**: Uses cached data on JSON read errors
- **Atomic reads**: Tolerates partial writes during file updates
- **Logging**: Warnings for data inconsistencies
### Error Recovery
```python
# Pseudocode for error handling pattern
try:
with open(data_file) as f:
new_data = json.load(f)
_LAST_DATA = new_data # Cache successful read
except (FileNotFoundError, json.JSONDecodeError):
logging.warning("Using cached data due to read error")
new_data = _LAST_DATA # Use cached data
```
## Performance Characteristics
- **Client-side rendering**: Plotly.js handles chart rendering
- **Efficient updates**: Only redraws when data changes
- **Memory bounded**: Limited by max bars in data files (1000)
- **Network efficient**: Local file polling (no external API calls)
## Testing
Run application tests:
```bash
uv run pytest test_app.py -v
```
Test coverage includes:
- Chart building functions
- Data loading and caching
- Error handling scenarios
- Layout rendering
- Callback functionality
## Configuration Options
### Server Configuration
- **Host**: `0.0.0.0` (accessible from network)
- **Port**: `8050` (default Dash port)
- **Debug mode**: Disabled in production
### Chart Configuration
- **Update interval**: 500ms (configurable via dcc.Interval)
- **Display mode bar**: Enabled for user interaction
- **Logo display**: Disabled for clean interface
## Known Issues
- High CPU usage during rapid data updates
- Memory usage grows with chart history
- No authentication or access control
- Limited mobile responsiveness for complex charts
## Development Notes
- Uses Flask development server (not suitable for production)
- Callback exceptions suppressed for partial data scenarios
- Bootstrap CSS loaded from CDN
- Chart configurations optimized for financial data visualization

View File

@@ -0,0 +1,83 @@
# Module: db_interpreter
## Purpose
The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.
## Public Interface
### Classes
- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook
- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp
### Functions
- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings
### Methods
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows
## Usage Examples
```python
from pathlib import Path
from db_interpreter import DBInterpreter
# Initialize interpreter
db_path = Path("data/BTC-USDT-2025-01-01.db")
interpreter = DBInterpreter(db_path)
# Stream orderbook and trade data
for ob_update, trades in interpreter.stream():
# Process orderbook update
print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")
print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")
# Process trades in this window
for trade in trades:
trade_id, price, size, side, timestamp_ms = trade[1:6]
print(f"Trade: {side} {size} @ {price}")
```
## Dependencies
### Internal
- None (standalone module)
### External
- `sqlite3`: Database connectivity
- `pathlib`: Path handling
- `dataclasses`: Data structure definitions
- `typing`: Type annotations
- `logging`: Debug and error logging
## Performance Characteristics
- **Batch sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage
- **SQLite optimizations**: Read-only, immutable mode, large mmap and cache sizes
- **Memory efficient**: Streaming iterator pattern prevents loading entire dataset
- **Temporal windowing**: One-row lookahead for precise time boundary calculation
## Testing
Run module tests:
```bash
uv run pytest test_db_interpreter.py -v
```
Test coverage includes:
- Batch reading correctness
- Temporal window boundary handling
- Trade-to-window assignment accuracy
- End-of-stream behavior
- Error handling for malformed data
## Known Issues
- Requires specific database schema (book and trades tables)
- Python-literal string parsing assumes well-formed input
- Large databases may require memory monitoring during streaming
## Configuration
- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048)
- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096)
- SQLite PRAGMA settings optimized for read-only sequential access

View File

@@ -0,0 +1,162 @@
# External Dependencies
## Overview
This document describes all external dependencies used in the orderflow backtest system, their purposes, versions, and justifications for inclusion.
## Production Dependencies
### Core Framework Dependencies
#### Dash (^2.18.2)
- **Purpose**: Web application framework for interactive visualizations
- **Usage**: Real-time chart rendering and user interface
- **Justification**: Mature Python-based framework with excellent Plotly integration
- **Key Features**: Reactive components, built-in server, callback system
#### Dash Bootstrap Components (^1.6.0)
- **Purpose**: Bootstrap CSS framework integration for Dash
- **Usage**: Responsive layout grid and modern UI styling
- **Justification**: Provides professional appearance with minimal custom CSS
#### Plotly (^5.24.1)
- **Purpose**: Interactive charting and visualization library
- **Usage**: OHLC candlesticks, volume bars, depth charts, OBI metrics
- **Justification**: Industry standard for financial data visualization
- **Key Features**: WebGL acceleration, zooming/panning, dark themes
### Data Processing Dependencies
#### Pandas (^2.2.3)
- **Purpose**: Data manipulation and analysis library
- **Usage**: Minimal usage for data structure conversions in visualization
- **Justification**: Standard tool for financial data handling
- **Note**: Usage kept minimal to maintain performance
#### Typer (^0.13.1)
- **Purpose**: Modern CLI framework
- **Usage**: Command-line argument parsing and help generation
- **Justification**: Type-safe, auto-generated help, better UX than argparse
- **Key Features**: Type hints integration, automatic validation
### Data Storage Dependencies
#### SQLite3 (Built-in)
- **Purpose**: Database connectivity for historical data
- **Usage**: Read-only access to orderbook and trade data
- **Justification**: Built into Python, no external dependencies, excellent performance
- **Configuration**: Optimized with immutable mode and mmap
## Development and Testing Dependencies
#### Pytest (^8.3.4)
- **Purpose**: Testing framework
- **Usage**: Unit tests, integration tests, test discovery
- **Justification**: Standard Python testing tool with excellent plugin ecosystem
#### Coverage (^7.6.9)
- **Purpose**: Code coverage measurement
- **Usage**: Test coverage reporting and quality metrics
- **Justification**: Essential for maintaining code quality
## Build and Package Management
#### UV (Package Manager)
- **Purpose**: Fast Python package manager and task runner
- **Usage**: Dependency management, virtual environments, script execution
- **Justification**: Significantly faster than pip/poetry, better lock file format
- **Commands**: `uv sync`, `uv run`, `uv add`
## Python Standard Library Usage
### Core Libraries
- **sqlite3**: Database connectivity
- **json**: JSON serialization for IPC
- **pathlib**: Modern file path handling
- **subprocess**: Process management for visualization
- **logging**: Structured logging throughout application
- **datetime**: Date/time parsing and manipulation
- **dataclasses**: Structured data types
- **typing**: Type annotations and hints
- **tempfile**: Atomic file operations
- **ast**: Safe evaluation of Python literals
### Performance Libraries
- **itertools**: Efficient iteration patterns
- **functools**: Function decoration and caching
- **collections**: Specialized data structures
## Dependency Justifications
### Why Dash Over Alternatives?
- **vs. Streamlit**: Better real-time updates, more control over layout
- **vs. Flask + Custom JS**: Integrated Plotly support, faster development
- **vs. Jupyter**: Better for production deployment, process isolation
### Why SQLite Over Alternatives?
- **vs. PostgreSQL**: No server setup required, excellent read performance
- **vs. Parquet**: Better for time-series queries, built-in indexing
- **vs. CSV**: Proper data types, much faster queries, atomic transactions
### Why UV Over Poetry/Pip?
- **vs. Poetry**: Significantly faster dependency resolution and installation
- **vs. Pip**: Better dependency locking, integrated task runner
- **vs. Pipenv**: More active development, better performance
## Version Pinning Strategy
### Patch Version Pinning
- Core dependencies (Dash, Plotly) pinned to patch versions
- Prevents breaking changes while allowing security updates
### Range Pinning
- Development tools use caret (^) ranges for flexibility
- Testing tools can update more freely
### Lock File Management
- `uv.lock` ensures reproducible builds across environments
- Regular updates scheduled monthly for security patches
## Security Considerations
### Dependency Scanning
- Regular audit of dependencies for known vulnerabilities
- Automated updates for security patches
- Minimal dependency tree to reduce attack surface
### Data Isolation
- Read-only database access prevents data modification
- No external network connections required for core functionality
- All file operations contained within project directory
## Performance Impact
### Bundle Size
- Core runtime: ~50MB with all dependencies
- Dash frontend: Additional ~10MB for JavaScript assets
- SQLite: Zero overhead (built-in)
### Startup Time
- Cold start: ~2-3 seconds for full application
- UV virtual environment activation: ~100ms
- Database connection: ~50ms per file
### Memory Usage
- Base application: ~100MB
- Per 1000 OHLC bars: ~5MB additional
- Plotly charts: ~20MB for complex visualizations
## Maintenance Schedule
### Monthly
- Security update review and application
- Dependency version bump evaluation
### Quarterly
- Major version update consideration
- Performance impact assessment
- Alternative technology evaluation
### Annually
- Complete dependency audit
- Technology stack review
- Migration planning for deprecated packages

View File

@@ -0,0 +1,101 @@
# Module: level_parser
## Purpose
The `level_parser` module provides utilities for parsing and normalizing orderbook level data from various string formats. It handles JSON and Python literal representations, converting them into standardized numeric tuples for processing.
## Public Interface
### Functions
- `normalize_levels(levels: Any) -> List[List[float]]`: Parse levels into [[price, size], ...] format, filtering out zero/negative sizes
- `parse_levels_including_zeros(levels: Any) -> List[Tuple[float, float]]`: Parse levels preserving zero sizes for deletion operations
### Private Functions
- `_parse_string_to_list(levels: Any) -> List[Any]`: Core parsing logic trying JSON first, then literal_eval
- `_extract_price_size(item: Any) -> Tuple[Any, Any]`: Extract price/size from dict or list/tuple formats
## Usage Examples
```python
from level_parser import normalize_levels, parse_levels_including_zeros
# Parse standard levels (filters zeros)
levels = normalize_levels('[[50000.0, 1.5], [49999.0, 2.0]]')
# Returns: [[50000.0, 1.5], [49999.0, 2.0]]
# Parse with zero sizes preserved (for deletions)
updates = parse_levels_including_zeros('[[50000.0, 0.0], [49999.0, 1.5]]')
# Returns: [(50000.0, 0.0), (49999.0, 1.5)]
# Supports dict format
dict_levels = normalize_levels('[{"price": 50000.0, "size": 1.5}]')
# Returns: [[50000.0, 1.5]]
# Short key format
short_levels = normalize_levels('[{"p": 50000.0, "s": 1.5}]')
# Returns: [[50000.0, 1.5]]
```
## Dependencies
### External
- `json`: Primary parsing method for level data
- `ast.literal_eval`: Fallback parsing for Python literal formats
- `logging`: Debug logging for parsing issues
- `typing`: Type annotations
## Input Formats Supported
### JSON Array Format
```json
[[50000.0, 1.5], [49999.0, 2.0]]
```
### Dict Format (Full Keys)
```json
[{"price": 50000.0, "size": 1.5}, {"price": 49999.0, "size": 2.0}]
```
### Dict Format (Short Keys)
```json
[{"p": 50000.0, "s": 1.5}, {"p": 49999.0, "s": 2.0}]
```
### Python Literal Format
```python
"[(50000.0, 1.5), (49999.0, 2.0)]"
```
## Error Handling
- **Graceful Degradation**: Returns empty list on parse failures
- **Data Validation**: Filters out invalid price/size pairs
- **Type Safety**: Converts all values to float before processing
- **Debug Logging**: Logs warnings for malformed input without crashing
## Performance Characteristics
- **Fast Path**: JSON parsing prioritized for performance
- **Fallback Support**: ast.literal_eval as backup for edge cases
- **Memory Efficient**: Processes items iteratively, not loading entire dataset
- **Validation**: Minimal overhead with early filtering of invalid data
## Testing
```bash
uv run pytest test_level_parser.py -v
```
Test coverage includes:
- JSON format parsing accuracy
- Dict format (both key styles) parsing
- Python literal fallback parsing
- Zero size preservation vs filtering
- Error handling for malformed input
- Type conversion edge cases
## Known Limitations
- Assumes well-formed numeric data (price/size as numbers)
- Does not validate economic constraints (e.g., positive prices)
- Limited to list/dict input formats
- No support for streaming/incremental parsing

168
docs/modules/main.md Normal file
View File

@@ -0,0 +1,168 @@
# Module: main
## Purpose
The `main` module provides the command-line interface (CLI) orchestration for the orderflow backtest system. It handles database discovery, process management, and coordinates the streaming pipeline with the visualization frontend using Typer for argument parsing.
## Public Interface
### Functions
- `main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None`: Primary CLI entrypoint
- `discover_databases(instrument: str, start_date: str, end_date: str) -> list[Path]`: Find matching database files
- `launch_visualizer() -> subprocess.Popen | None`: Start Dash application in separate process
### CLI Arguments
- `instrument`: Trading pair identifier (e.g., "BTC-USDT")
- `start_date`: Start date in YYYY-MM-DD format (UTC)
- `end_date`: End date in YYYY-MM-DD format (UTC)
- `--window-seconds`: OHLC aggregation window size (default: 60)
## Usage Examples
### Command Line Usage
```bash
# Basic usage with default 60-second windows
uv run python main.py BTC-USDT 2025-01-01 2025-01-31
# Custom window size
uv run python main.py ETH-USDT 2025-02-01 2025-02-28 --window-seconds 30
# Single day processing
uv run python main.py SOL-USDT 2025-03-15 2025-03-15
```
### Programmatic Usage
```python
from main import main, discover_databases
# Run processing pipeline
main("BTC-USDT", "2025-01-01", "2025-01-31", window_seconds=120)
# Discover available databases
db_files = discover_databases("ETH-USDT", "2025-02-01", "2025-02-28")
print(f"Found {len(db_files)} database files")
```
## Dependencies
### Internal
- `db_interpreter.DBInterpreter`: Database streaming
- `ohlc_processor.OHLCProcessor`: Trade aggregation and orderbook processing
- `viz_io`: Data clearing functions
### External
- `typer`: CLI framework and argument parsing
- `subprocess`: Process management for visualization
- `pathlib`: File and directory operations
- `datetime`: Date parsing and validation
- `logging`: Operational logging
- `sys`: Exit code management
## Database Discovery Logic
### File Pattern Matching
```python
# Expected directory structure
../data/OKX/{instrument}/{date}/
# Example paths
../data/OKX/BTC-USDT/2025-01-01/trades.db
../data/OKX/ETH-USDT/2025-02-15/trades.db
```
### Discovery Algorithm
1. Parse start and end dates to datetime objects
2. Iterate through date range (inclusive)
3. Construct expected path for each date
4. Verify file existence and readability
5. Return sorted list of valid database paths
## Process Orchestration
### Visualization Process Management
```python
# Launch Dash app in separate process
viz_process = subprocess.Popen([
"uv", "run", "python", "app.py"
], cwd=project_root)
# Process management
try:
# Main processing loop
process_databases(db_files)
finally:
# Cleanup visualization process
if viz_process:
viz_process.terminate()
viz_process.wait(timeout=5)
```
### Data Processing Pipeline
1. **Initialize**: Clear existing data files
2. **Launch**: Start visualization process
3. **Stream**: Process each database sequentially
4. **Aggregate**: Generate OHLC bars and depth snapshots
5. **Cleanup**: Terminate visualization and finalize
## Error Handling
### Database Access Errors
- **File not found**: Log warning and skip missing databases
- **Permission denied**: Log error and exit with status code 1
- **Corruption**: Log error for specific database and continue with next
### Process Management Errors
- **Visualization startup failure**: Log error but continue processing
- **Process termination**: Graceful shutdown with timeout
- **Resource cleanup**: Ensure child processes are terminated
### Date Validation
- **Invalid format**: Clear error message with expected format
- **Invalid range**: End date must be >= start date
- **Future dates**: Warning for dates beyond data availability
## Performance Characteristics
- **Sequential processing**: Databases processed one at a time
- **Memory efficient**: Streaming approach prevents loading entire datasets
- **Process isolation**: Visualization runs independently
- **Resource cleanup**: Automatic process termination on exit
## Testing
Run module tests:
```bash
uv run pytest test_main.py -v
```
Test coverage includes:
- Database discovery logic
- Date parsing and validation
- Process management
- Error handling scenarios
- CLI argument validation
## Configuration
### Default Settings
- **Data directory**: `../data/OKX` (relative to project root)
- **Visualization command**: `uv run python app.py`
- **Window size**: 60 seconds
- **Process timeout**: 5 seconds for termination
### Environment Variables
- **DATA_PATH**: Override default data directory
- **VISUALIZATION_PORT**: Override Dash port (requires app.py modification)
## Known Issues
- Assumes specific directory structure under `../data/OKX`
- No validation of database schema compatibility
- Limited error recovery for process management
- No progress indication for large datasets
## Development Notes
- Uses Typer for modern CLI interface
- Subprocess management compatible with Unix/Windows
- Logging configured for both development and production use
- Exit codes follow Unix conventions (0=success, 1=error)

View File

@@ -1,302 +0,0 @@
# Module: Metrics Calculation System
## Purpose
The metrics calculation system provides high-performance computation of Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD) indicators for cryptocurrency trading analysis. It processes orderbook snapshots and trade data to generate financial metrics with per-snapshot granularity.
## Public Interface
### Classes
#### `Metric` (dataclass)
Represents calculated metrics for a single orderbook snapshot.
```python
@dataclass(slots=True)
class Metric:
snapshot_id: int # Reference to source snapshot
timestamp: int # Unix timestamp
obi: float # Order Book Imbalance [-1, 1]
cvd: float # Cumulative Volume Delta
best_bid: float | None # Best bid price
best_ask: float | None # Best ask price
```
#### `MetricCalculator` (static class)
Provides calculation methods for financial metrics.
```python
class MetricCalculator:
@staticmethod
def calculate_obi(snapshot: BookSnapshot) -> float
@staticmethod
def calculate_volume_delta(trades: List[Trade]) -> float
@staticmethod
def calculate_cvd(previous_cvd: float, volume_delta: float) -> float
@staticmethod
def get_best_bid_ask(snapshot: BookSnapshot) -> tuple[float | None, float | None]
```
### Functions
#### Order Book Imbalance (OBI) Calculation
```python
def calculate_obi(snapshot: BookSnapshot) -> float:
"""
Calculate Order Book Imbalance using the standard formula.
Formula: OBI = (Vb - Va) / (Vb + Va)
Where:
Vb = Total volume on bid side
Va = Total volume on ask side
Args:
snapshot: BookSnapshot containing bids and asks data
Returns:
float: OBI value between -1 and 1, or 0.0 if no volume
Example:
>>> snapshot = BookSnapshot(bids={50000.0: OrderbookLevel(...)}, ...)
>>> obi = MetricCalculator.calculate_obi(snapshot)
>>> print(f"OBI: {obi:.3f}")
OBI: 0.333
"""
```
#### Volume Delta Calculation
```python
def calculate_volume_delta(trades: List[Trade]) -> float:
"""
Calculate Volume Delta for a list of trades.
Volume Delta = Buy Volume - Sell Volume
- Buy trades (side = "buy"): positive contribution
- Sell trades (side = "sell"): negative contribution
Args:
trades: List of Trade objects for specific timestamp
Returns:
float: Net volume delta (positive = buy pressure, negative = sell pressure)
Example:
>>> trades = [
... Trade(side="buy", size=10.0, ...),
... Trade(side="sell", size=3.0, ...)
... ]
>>> vd = MetricCalculator.calculate_volume_delta(trades)
>>> print(f"Volume Delta: {vd}")
Volume Delta: 7.0
"""
```
#### Cumulative Volume Delta (CVD) Calculation
```python
def calculate_cvd(previous_cvd: float, volume_delta: float) -> float:
"""
Calculate Cumulative Volume Delta with incremental support.
Formula: CVD_t = CVD_{t-1} + Volume_Delta_t
Args:
previous_cvd: Previous CVD value (use 0.0 for reset)
volume_delta: Current volume delta to add
Returns:
float: New cumulative volume delta value
Example:
>>> cvd = 0.0 # Starting value
>>> cvd = MetricCalculator.calculate_cvd(cvd, 10.0) # First trade
>>> cvd = MetricCalculator.calculate_cvd(cvd, -5.0) # Second trade
>>> print(f"CVD: {cvd}")
CVD: 5.0
"""
```
## Usage Examples
### Basic OBI Calculation
```python
from models import MetricCalculator, BookSnapshot, OrderbookLevel
# Create sample orderbook snapshot
snapshot = BookSnapshot(
id=1,
timestamp=1640995200,
bids={
50000.0: OrderbookLevel(price=50000.0, size=10.0, liquidation_count=0, order_count=1),
49999.0: OrderbookLevel(price=49999.0, size=5.0, liquidation_count=0, order_count=1),
},
asks={
50001.0: OrderbookLevel(price=50001.0, size=3.0, liquidation_count=0, order_count=1),
50002.0: OrderbookLevel(price=50002.0, size=2.0, liquidation_count=0, order_count=1),
}
)
# Calculate OBI
obi = MetricCalculator.calculate_obi(snapshot)
print(f"OBI: {obi:.3f}") # Output: OBI: 0.500
# Explanation: (15 - 5) / (15 + 5) = 10/20 = 0.5
```
### CVD Calculation with Reset
```python
from models import MetricCalculator, Trade
# Simulate trading session
cvd = 0.0 # Reset CVD at session start
# Process trades for first timestamp
trades_t1 = [
Trade(id=1, trade_id=1.0, price=50000.0, size=8.0, side="buy", timestamp=1000),
Trade(id=2, trade_id=2.0, price=50001.0, size=3.0, side="sell", timestamp=1000),
]
vd_t1 = MetricCalculator.calculate_volume_delta(trades_t1) # 8.0 - 3.0 = 5.0
cvd = MetricCalculator.calculate_cvd(cvd, vd_t1) # 0.0 + 5.0 = 5.0
# Process trades for second timestamp
trades_t2 = [
Trade(id=3, trade_id=3.0, price=49999.0, size=2.0, side="buy", timestamp=1001),
Trade(id=4, trade_id=4.0, price=50000.0, size=7.0, side="sell", timestamp=1001),
]
vd_t2 = MetricCalculator.calculate_volume_delta(trades_t2) # 2.0 - 7.0 = -5.0
cvd = MetricCalculator.calculate_cvd(cvd, vd_t2) # 5.0 + (-5.0) = 0.0
print(f"Final CVD: {cvd}") # Output: Final CVD: 0.0
```
### Complete Metrics Processing
```python
from models import MetricCalculator, Metric
def process_snapshot_metrics(snapshot, trades, previous_cvd=0.0):
"""Process complete metrics for a single snapshot."""
# Calculate OBI
obi = MetricCalculator.calculate_obi(snapshot)
# Calculate volume delta and CVD
volume_delta = MetricCalculator.calculate_volume_delta(trades)
cvd = MetricCalculator.calculate_cvd(previous_cvd, volume_delta)
# Extract best bid/ask
best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
# Create metric record
metric = Metric(
snapshot_id=snapshot.id,
timestamp=snapshot.timestamp,
obi=obi,
cvd=cvd,
best_bid=best_bid,
best_ask=best_ask
)
return metric, cvd
# Usage in processing loop
current_cvd = 0.0
for snapshot, trades in snapshot_trade_pairs:
metric, current_cvd = process_snapshot_metrics(snapshot, trades, current_cvd)
# Store metric to database...
```
## Dependencies
### Internal
- `models.BookSnapshot`: Orderbook state data
- `models.Trade`: Individual trade execution data
- `models.OrderbookLevel`: Price level information
### External
- **Python Standard Library**: `typing` for type hints
- **No external packages required**
## Performance Characteristics
### Computational Complexity
- **OBI Calculation**: O(n) where n = number of price levels
- **Volume Delta**: O(m) where m = number of trades
- **CVD Calculation**: O(1) - simple addition
- **Best Bid/Ask**: O(n) for min/max operations
### Memory Usage
- **Static Methods**: No instance state, minimal memory overhead
- **Calculations**: Process data in-place without copying
- **Results**: Lightweight `Metric` objects with slots optimization
### Typical Performance
```python
# Benchmark results (approximate)
Snapshot with 50 price levels: ~0.1ms per OBI calculation
Timestamp with 20 trades: ~0.05ms per volume delta
CVD update: ~0.001ms per calculation
Complete metric processing: ~0.2ms per snapshot
```
## Error Handling
### Edge Cases Handled
```python
# Empty orderbook
empty_snapshot = BookSnapshot(bids={}, asks={})
obi = MetricCalculator.calculate_obi(empty_snapshot) # Returns 0.0
# No trades
empty_trades = []
vd = MetricCalculator.calculate_volume_delta(empty_trades) # Returns 0.0
# Zero volume scenario
zero_vol_snapshot = BookSnapshot(
bids={50000.0: OrderbookLevel(price=50000.0, size=0.0, ...)},
asks={50001.0: OrderbookLevel(price=50001.0, size=0.0, ...)}
)
obi = MetricCalculator.calculate_obi(zero_vol_snapshot) # Returns 0.0
```
### Validation
- **OBI Range**: Results automatically bounded to [-1, 1]
- **Division by Zero**: Handled gracefully with 0.0 return
- **Invalid Data**: Empty collections handled without errors
## Testing
### Test Coverage
- **Unit Tests**: `tests/test_metric_calculator.py`
- **Integration Tests**: Included in storage and strategy tests
- **Edge Cases**: Empty data, zero volume, boundary conditions
### Running Tests
```bash
# Run metric calculator tests specifically
uv run pytest tests/test_metric_calculator.py -v
# Run all tests with metrics
uv run pytest -k "metric" -v
# Performance tests
uv run pytest tests/test_metric_calculator.py::test_calculate_obi_performance
```
## Known Issues
### Current Limitations
- **Precision**: Floating-point arithmetic limitations for very small numbers
- **Scale**: No optimization for extremely large orderbooks (>10k levels)
- **Currency**: No multi-currency support (assumes single denomination)
### Planned Enhancements
- **Decimal Precision**: Consider `decimal.Decimal` for high-precision calculations
- **Vectorization**: NumPy integration for batch calculations
- **Additional Metrics**: Volume Profile, Liquidity metrics, Delta Flow
---
The metrics calculation system provides a robust foundation for financial analysis with clean interfaces, comprehensive error handling, and optimal performance for high-frequency trading data.

View File

@@ -0,0 +1,147 @@
# Module: metrics_calculator
## Purpose
The `metrics_calculator` module handles calculation and management of trading metrics including Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD). It provides windowed aggregation with throttled updates for real-time visualization.
## Public Interface
### Classes
- `MetricsCalculator(window_seconds: int = 60, emit_every_n_updates: int = 25)`: Main metrics calculation engine
### Methods
- `update_cvd_from_trade(side: str, size: float) -> None`: Update CVD from individual trade data
- `update_obi_metrics(timestamp: str, total_bids: float, total_asks: float) -> None`: Update OBI metrics from orderbook volumes
- `finalize_metrics() -> None`: Emit final metrics bar at processing end
### Properties
- `cvd_cumulative: float`: Current cumulative volume delta value
### Private Methods
- `_emit_metrics_bar() -> None`: Emit current metrics to visualization layer
## Usage Examples
```python
from metrics_calculator import MetricsCalculator
# Initialize calculator
calc = MetricsCalculator(window_seconds=60, emit_every_n_updates=25)
# Update CVD from trades
calc.update_cvd_from_trade("buy", 1.5) # +1.5 CVD
calc.update_cvd_from_trade("sell", 1.0) # -1.0 CVD, net +0.5
# Update OBI from orderbook
total_bids, total_asks = 150.0, 120.0
calc.update_obi_metrics("1640995200000", total_bids, total_asks)
# Access current CVD
current_cvd = calc.cvd_cumulative # 0.5
# Finalize at end of processing
calc.finalize_metrics()
```
## Metrics Definitions
### Cumulative Volume Delta (CVD)
- **Formula**: CVD = Σ(buy_volume - sell_volume)
- **Interpretation**: Positive = more buying pressure, Negative = more selling pressure
- **Accumulation**: Running total across all processed trades
- **Update Frequency**: Every trade
### Order Book Imbalance (OBI)
- **Formula**: OBI = total_bid_volume - total_ask_volume
- **Interpretation**: Positive = more bid liquidity, Negative = more ask liquidity
- **Aggregation**: OHLC-style bars per time window (open, high, low, close)
- **Update Frequency**: Throttled per orderbook update
## Dependencies
### Internal
- `viz_io.upsert_metric_bar`: Output interface for visualization
### External
- `logging`: Warning messages for unknown trade sides
- `typing`: Type annotations
## Windowed Aggregation
### OBI Windows
- **Window Size**: Configurable via `window_seconds` (default: 60)
- **Window Alignment**: Aligned to epoch time boundaries
- **OHLC Tracking**: Maintains open, high, low, close values per window
- **Rollover**: Automatic window transitions with final bar emission
### Throttling Mechanism
- **Purpose**: Reduce I/O overhead during high-frequency updates
- **Trigger**: Every N updates (configurable via `emit_every_n_updates`)
- **Behavior**: Emits intermediate updates for real-time visualization
- **Final Emission**: Guaranteed on window rollover and finalization
## State Management
### CVD State
- `cvd_cumulative: float`: Running total across all trades
- **Persistence**: Maintained throughout processor lifetime
- **Updates**: Incremental addition/subtraction per trade
### OBI State
- `metrics_window_start: int`: Current window start timestamp
- `metrics_bar: dict`: Current OBI OHLC values
- `_metrics_since_last_emit: int`: Throttling counter
## Output Format
### Metrics Bar Structure
```python
{
'obi_open': float, # First OBI value in window
'obi_high': float, # Maximum OBI in window
'obi_low': float, # Minimum OBI in window
'obi_close': float, # Latest OBI value
}
```
### Visualization Integration
- Emitted via `viz_io.upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close, cvd_value)`
- Compatible with existing OHLC visualization infrastructure
- Real-time updates during active processing
## Performance Characteristics
- **Low Memory**: Maintains only current window state
- **Throttled I/O**: Configurable update frequency prevents excessive writes
- **Efficient Updates**: O(1) operations for trade and OBI updates
- **Window Management**: Automatic transitions without manual intervention
## Configuration
### Constructor Parameters
- `window_seconds: int`: Time window for OBI aggregation (default: 60)
- `emit_every_n_updates: int`: Throttling factor for intermediate updates (default: 25)
### Tuning Guidelines
- **Higher throttling**: Reduces I/O load, delays real-time updates
- **Lower throttling**: More responsive visualization, higher I/O overhead
- **Window size**: Affects granularity of OBI trends (shorter = more detail)
## Testing
```bash
uv run pytest test_metrics_calculator.py -v
```
Test coverage includes:
- CVD accumulation accuracy across multiple trades
- OBI window rollover and OHLC tracking
- Throttling behavior verification
- Edge cases (unknown trade sides, empty windows)
- Integration with visualization output
## Known Limitations
- CVD calculation assumes binary buy/sell classification
- No support for partial fills or complex order types
- OBI calculation treats all liquidity equally (no price weighting)
- Window boundaries aligned to absolute timestamps (no sliding windows)

View File

@@ -0,0 +1,122 @@
# Module: ohlc_processor
## Purpose
The `ohlc_processor` module serves as the main coordinator for trade data processing, orchestrating OHLC aggregation, orderbook management, and metrics calculation. It has been refactored into a modular architecture using composition with specialized helper modules.
## Public Interface
### Classes
- `OHLCProcessor(window_seconds: int = 60, depth_levels_per_side: int = 50)`: Main orchestrator class that coordinates trade processing using composition
### Methods
- `process_trades(trades: list[tuple]) -> None`: Aggregate trades into OHLC bars and update CVD metrics
- `update_orderbook(ob_update: OrderbookUpdate) -> None`: Apply orderbook updates and calculate OBI metrics
- `finalize() -> None`: Emit final OHLC bar and metrics data
- `cvd_cumulative` (property): Access to cumulative volume delta value
### Composed Modules
- `OrderbookManager`: Handles in-memory orderbook state and depth snapshots
- `MetricsCalculator`: Manages OBI and CVD metric calculations
- `level_parser` functions: Parse and normalize orderbook level data
## Usage Examples
```python
from ohlc_processor import OHLCProcessor
from db_interpreter import DBInterpreter
# Initialize processor with 1-minute windows and 50 depth levels
processor = OHLCProcessor(window_seconds=60, depth_levels_per_side=50)
# Process streaming data
for ob_update, trades in DBInterpreter(db_path).stream():
# Aggregate trades into OHLC bars
processor.process_trades(trades)
# Update orderbook and emit depth snapshots
processor.update_orderbook(ob_update)
# Finalize processing
processor.finalize()
```
### Advanced Configuration
```python
# Custom window size and depth levels
processor = OHLCProcessor(
window_seconds=30, # 30-second bars
depth_levels_per_side=25 # Top 25 levels per side
)
```
## Dependencies
### Internal Modules
- `orderbook_manager.OrderbookManager`: In-memory orderbook state management
- `metrics_calculator.MetricsCalculator`: OBI and CVD metrics calculation
- `level_parser`: Orderbook level parsing utilities
- `viz_io`: JSON output for visualization
- `db_interpreter.OrderbookUpdate`: Input data structures
### External
- `typing`: Type annotations
- `logging`: Debug and operational logging
## Modular Architecture
The processor now follows a clean composition pattern:
1. **Main Coordinator** (`OHLCProcessor`):
- Orchestrates trade and orderbook processing
- Maintains OHLC bar state and window management
- Delegates specialized tasks to composed modules
2. **Orderbook Management** (`OrderbookManager`):
- Maintains in-memory price→size mappings
- Applies partial updates and handles deletions
- Provides sorted top-N level extraction
3. **Metrics Calculation** (`MetricsCalculator`):
- Tracks CVD from trade flow (buy/sell volume delta)
- Calculates OBI from orderbook volume imbalance
- Manages windowed metrics aggregation with throttling
4. **Level Parsing** (`level_parser` module):
- Normalizes JSON and Python literal level representations
- Handles zero-size levels for orderbook deletions
- Provides robust error handling for malformed data
## Performance Characteristics
- **Throttled Updates**: Prevents excessive I/O during high-frequency periods
- **Memory Efficient**: Maintains only current window and top-N depth levels
- **Incremental Processing**: Applies only changed orderbook levels
- **Atomic Operations**: Thread-safe updates to shared data structures
## Testing
Run module tests:
```bash
uv run pytest test_ohlc_processor.py -v
```
Test coverage includes:
- OHLC calculation accuracy across window boundaries
- Volume accumulation correctness
- High/low price tracking
- Orderbook update application
- Depth snapshot generation
- OBI metric calculation
## Known Issues
- Orderbook level parsing assumes well-formed JSON or Python literals
- Memory usage scales with number of active price levels
- Clock skew between trades and orderbook updates not handled
## Configuration Options
- `window_seconds`: Time window size for OHLC aggregation (default: 60)
- `depth_levels_per_side`: Number of top price levels to maintain (default: 50)
- `UPSERT_THROTTLE_MS`: Minimum interval between upsert operations (internal)
- `DEPTH_EMIT_THROTTLE_MS`: Minimum interval between depth emissions (internal)

View File

@@ -0,0 +1,121 @@
# Module: orderbook_manager
## Purpose
The `orderbook_manager` module provides in-memory orderbook state management with partial update capabilities. It maintains separate bid and ask sides and supports efficient top-level extraction for visualization.
## Public Interface
### Classes
- `OrderbookManager(depth_levels_per_side: int = 50)`: Main orderbook state manager
### Methods
- `apply_updates(bids_updates: List[Tuple[float, float]], asks_updates: List[Tuple[float, float]]) -> None`: Apply partial updates to both sides
- `get_total_volume() -> Tuple[float, float]`: Get total bid and ask volumes
- `get_top_levels() -> Tuple[List[List[float]], List[List[float]]]`: Get sorted top levels for both sides
### Private Methods
- `_apply_partial_updates(side_map: Dict[float, float], updates: List[Tuple[float, float]]) -> None`: Apply updates to one side
- `_build_top_levels(side_map: Dict[float, float], limit: int, reverse: bool) -> List[List[float]]`: Extract sorted top levels
## Usage Examples
```python
from orderbook_manager import OrderbookManager
# Initialize manager
manager = OrderbookManager(depth_levels_per_side=25)
# Apply orderbook updates
bids = [(50000.0, 1.5), (49999.0, 2.0)]
asks = [(50001.0, 1.2), (50002.0, 0.8)]
manager.apply_updates(bids, asks)
# Get volume totals for OBI calculation
total_bids, total_asks = manager.get_total_volume()
obi = total_bids - total_asks
# Get top levels for depth visualization
bids_sorted, asks_sorted = manager.get_top_levels()
# Handle deletions (size = 0)
deletions = [(50000.0, 0.0)] # Remove price level
manager.apply_updates(deletions, [])
```
## Dependencies
### External
- `typing`: Type annotations for Dict, List, Tuple
## State Management
### Internal State
- `_book_bids: Dict[float, float]`: Price → size mapping for bid side
- `_book_asks: Dict[float, float]`: Price → size mapping for ask side
- `depth_levels_per_side: int`: Configuration for top-N extraction
### Update Semantics
- **Size = 0**: Remove price level (deletion)
- **Size > 0**: Upsert price level with new size
- **Size < 0**: Ignored (invalid update)
### Sorting Behavior
- **Bids**: Descending by price (highest price first)
- **Asks**: Ascending by price (lowest price first)
- **Top-N**: Limited by `depth_levels_per_side` parameter
## Performance Characteristics
- **Memory Efficient**: Only stores non-zero price levels
- **Fast Updates**: O(1) upsert/delete operations using dict
- **Efficient Sorting**: Only sorts when extracting top levels
- **Bounded Output**: Limits result size for visualization performance
## Use Cases
### OBI Calculation
```python
total_bids, total_asks = manager.get_total_volume()
order_book_imbalance = total_bids - total_asks
```
### Depth Visualization
```python
bids, asks = manager.get_top_levels()
depth_payload = {"bids": bids, "asks": asks}
```
### Incremental Updates
```python
# Typical orderbook update cycle
updates = parse_orderbook_changes(raw_data)
manager.apply_updates(updates['bids'], updates['asks'])
```
## Testing
```bash
uv run pytest test_orderbook_manager.py -v
```
Test coverage includes:
- Partial update application correctness
- Deletion handling (size = 0)
- Volume calculation accuracy
- Top-level sorting and limiting
- Edge cases (empty books, single levels)
- Performance with large orderbooks
## Configuration
- `depth_levels_per_side`: Controls output size for visualization (default: 50)
- Affects memory usage and sorting performance
- Higher values provide more market depth detail
- Lower values improve processing speed
## Known Limitations
- No built-in validation of price/size values
- Memory usage scales with number of unique price levels
- No historical state tracking (current snapshot only)
- No support for spread calculation or market data statistics

155
docs/modules/viz_io.md Normal file
View File

@@ -0,0 +1,155 @@
# Module: viz_io
## Purpose
The `viz_io` module provides atomic inter-process communication (IPC) between the data processing pipeline and the visualization frontend. It manages JSON file-based data exchange with atomic writes to prevent race conditions and data corruption.
## Public Interface
### Functions
- `add_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Append new OHLC bar to rolling dataset
- `upsert_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Update existing bar or append new one
- `clear_data()`: Reset OHLC dataset to empty state
- `add_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Append OBI metric bar
- `upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Update existing OBI bar or append new one
- `clear_metrics()`: Reset metrics dataset to empty state
- `set_depth_data(bids, asks)`: Update current orderbook depth snapshot
### Constants
- `DATA_FILE`: Path to OHLC data JSON file
- `DEPTH_FILE`: Path to depth data JSON file
- `METRICS_FILE`: Path to metrics data JSON file
- `MAX_BARS`: Maximum number of bars to retain (1000)
## Usage Examples
### Basic OHLC Operations
```python
import viz_io
# Add a new OHLC bar
viz_io.add_ohlc_bar(
timestamp=1640995200000, # Unix timestamp in milliseconds
open_price=50000.0,
high_price=50100.0,
low_price=49900.0,
close_price=50050.0,
volume=125.5
)
# Update the current bar (if timestamp matches) or add new one
viz_io.upsert_ohlc_bar(
timestamp=1640995200000,
open_price=50000.0,
high_price=50150.0, # Updated high
low_price=49850.0, # Updated low
close_price=50075.0, # Updated close
volume=130.2 # Updated volume
)
```
### Orderbook Depth Management
```python
# Set current depth snapshot
bids = [[49990.0, 1.5], [49985.0, 2.1], [49980.0, 0.8]]
asks = [[50010.0, 1.2], [50015.0, 1.8], [50020.0, 2.5]]
viz_io.set_depth_data(bids, asks)
```
### Metrics Operations
```python
# Add Order Book Imbalance metrics
viz_io.add_metric_bar(
timestamp=1640995200000,
obi_open=0.15,
obi_high=0.22,
obi_low=0.08,
obi_close=0.18
)
```
## Dependencies
### Internal
- None (standalone utility module)
### External
- `json`: JSON serialization/deserialization
- `pathlib`: File path handling
- `typing`: Type annotations
- `tempfile`: Atomic write operations
## Data Formats
### OHLC Data (`ohlc_data.json`)
```json
[
[1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
[1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
]
```
Format: `[timestamp, open, high, low, close, volume]`
### Depth Data (`depth_data.json`)
```json
{
"bids": [[49990.0, 1.5], [49985.0, 2.1]],
"asks": [[50010.0, 1.2], [50015.0, 1.8]]
}
```
Format: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`
### Metrics Data (`metrics_data.json`)
```json
[
[1640995200000, 0.15, 0.22, 0.08, 0.18],
[1640995260000, 0.18, 0.25, 0.12, 0.20]
]
```
Format: `[timestamp, obi_open, obi_high, obi_low, obi_close]`
## Atomic Write Operations
All write operations use atomic file replacement to prevent partial reads:
1. Write data to temporary file
2. Flush and sync to disk
3. Atomically rename temporary file to target file
This ensures the visualization frontend always reads complete, valid JSON data.
## Performance Characteristics
- **Bounded Memory**: OHLC and metrics datasets limited to 1000 bars max
- **Atomic Operations**: No partial reads possible during writes
- **Rolling Window**: Automatic trimming of old data maintains constant memory usage
- **Fast Lookups**: Timestamp-based upsert operations use list scanning (acceptable for 1000 items)
## Testing
Run module tests:
```bash
uv run pytest test_viz_io.py -v
```
Test coverage includes:
- Atomic write operations
- Data format validation
- Rolling window behavior
- Upsert logic correctness
- File corruption prevention
- Concurrent read/write scenarios
## Known Issues
- File I/O may block briefly during atomic writes
- JSON parsing errors not propagated to callers
- Limited to 1000 bars maximum (configurable via MAX_BARS)
- No compression for large datasets
## Thread Safety
All operations are thread-safe for single writer, multiple reader scenarios:
- Writer: Data processing pipeline (single thread)
- Readers: Visualization frontend (polling)
- Atomic file operations prevent corruption during concurrent access