WIP UI rework with qt6

This commit is contained in:
2025-09-10 15:39:16 +08:00
parent 36385af6f3
commit ebf232317c
63 changed files with 4005 additions and 5221 deletions

165
docs/modules/app.md Normal file
View File

@@ -0,0 +1,165 @@
# Module: app
## Purpose
The `app` module provides a real-time Dash web application for visualizing OHLC candlestick charts, volume data, Order Book Imbalance (OBI) metrics, and orderbook depth. It implements a polling-based architecture that reads JSON data files and renders interactive charts with a dark theme.
## Public Interface
### Functions
- `build_empty_ohlc_fig() -> go.Figure`: Create empty OHLC chart with proper styling
- `build_empty_depth_fig() -> go.Figure`: Create empty depth chart with proper styling
- `build_ohlc_fig(data: List[list], metrics: List[list]) -> go.Figure`: Build complete OHLC+Volume+OBI chart
- `build_depth_fig(depth_data: dict) -> go.Figure`: Build orderbook depth visualization
### Global Variables
- `_LAST_DATA`: Cached OHLC data for error recovery
- `_LAST_DEPTH`: Cached depth data for error recovery
- `_LAST_METRICS`: Cached metrics data for error recovery
### Dash Application
- `app`: Main Dash application instance with Bootstrap theme
- Layout with responsive grid (9:3 ratio for OHLC:Depth charts)
- 500ms polling interval for real-time updates
## Usage Examples
### Running the Application
```bash
# Start the Dash server
uv run python app.py
# Access the web interface
# Open http://localhost:8050 in your browser
```
### Programmatic Usage
```python
from app import build_ohlc_fig, build_depth_fig
# Build charts with sample data
ohlc_data = [[1640995200000, 50000, 50100, 49900, 50050, 125.5]]
metrics_data = [[1640995200000, 0.15, 0.22, 0.08, 0.18]]
depth_data = {
"bids": [[49990, 1.5], [49985, 2.1]],
"asks": [[50010, 1.2], [50015, 1.8]]
}
ohlc_fig = build_ohlc_fig(ohlc_data, metrics_data)
depth_fig = build_depth_fig(depth_data)
```
## Dependencies
### Internal
- `viz_io`: Data file paths and JSON reading
- `viz_io.DATA_FILE`: OHLC data source
- `viz_io.DEPTH_FILE`: Depth data source
- `viz_io.METRICS_FILE`: Metrics data source
### External
- `dash`: Web application framework
- `dash.html`, `dash.dcc`: HTML and core components
- `dash_bootstrap_components`: Bootstrap styling
- `plotly.graph_objs`: Chart objects
- `plotly.subplots`: Multiple subplot support
- `pandas`: Data manipulation (minimal usage)
- `json`: JSON file parsing
- `logging`: Error and debug logging
- `pathlib`: File path handling
## Chart Architecture
### OHLC Chart (Left Panel, 9/12 width)
- **Main subplot**: Candlestick chart with OHLC data
- **Volume subplot**: Bar chart sharing x-axis with main chart
- **OBI subplot**: Order Book Imbalance candlestick chart in blue tones
- **Shared x-axis**: Synchronized zooming and panning across subplots
### Depth Chart (Right Panel, 3/12 width)
- **Cumulative depth**: Stepped line chart showing bid/ask liquidity
- **Color coding**: Green for bids, red for asks
- **Real-time updates**: Reflects current orderbook state
## Styling and Theme
### Dark Theme Configuration
- Background: Black (`#000000`)
- Text: White (`#ffffff`)
- Grid: Dark gray with transparency
- Candlesticks: Green (up) / Red (down)
- Volume: Gray bars
- OBI: Blue tones for candlesticks
- Depth: Green (bids) / Red (asks)
### Responsive Design
- Bootstrap grid system for layout
- Fluid container for full-width usage
- 100vh height for full viewport coverage
- Configurable chart display modes
## Data Polling and Error Handling
### Polling Strategy
- **Interval**: 500ms for near real-time updates
- **Graceful degradation**: Uses cached data on JSON read errors
- **Atomic reads**: Tolerates partial writes during file updates
- **Logging**: Warnings for data inconsistencies
### Error Recovery
```python
# Pseudocode for error handling pattern
try:
with open(data_file) as f:
new_data = json.load(f)
_LAST_DATA = new_data # Cache successful read
except (FileNotFoundError, json.JSONDecodeError):
logging.warning("Using cached data due to read error")
new_data = _LAST_DATA # Use cached data
```
## Performance Characteristics
- **Client-side rendering**: Plotly.js handles chart rendering
- **Efficient updates**: Only redraws when data changes
- **Memory bounded**: Limited by max bars in data files (1000)
- **Network efficient**: Local file polling (no external API calls)
## Testing
Run application tests:
```bash
uv run pytest test_app.py -v
```
Test coverage includes:
- Chart building functions
- Data loading and caching
- Error handling scenarios
- Layout rendering
- Callback functionality
## Configuration Options
### Server Configuration
- **Host**: `0.0.0.0` (accessible from network)
- **Port**: `8050` (default Dash port)
- **Debug mode**: Disabled in production
### Chart Configuration
- **Update interval**: 500ms (configurable via dcc.Interval)
- **Display mode bar**: Enabled for user interaction
- **Logo display**: Disabled for clean interface
## Known Issues
- High CPU usage during rapid data updates
- Memory usage grows with chart history
- No authentication or access control
- Limited mobile responsiveness for complex charts
## Development Notes
- Uses Flask development server (not suitable for production)
- Callback exceptions suppressed for partial data scenarios
- Bootstrap CSS loaded from CDN
- Chart configurations optimized for financial data visualization

View File

@@ -0,0 +1,83 @@
# Module: db_interpreter
## Purpose
The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.
## Public Interface
### Classes
- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook
- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp
### Functions
- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings
### Methods
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows
## Usage Examples
```python
from pathlib import Path
from db_interpreter import DBInterpreter
# Initialize interpreter
db_path = Path("data/BTC-USDT-2025-01-01.db")
interpreter = DBInterpreter(db_path)
# Stream orderbook and trade data
for ob_update, trades in interpreter.stream():
# Process orderbook update
print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")
print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")
# Process trades in this window
for trade in trades:
trade_id, price, size, side, timestamp_ms = trade[1:6]
print(f"Trade: {side} {size} @ {price}")
```
## Dependencies
### Internal
- None (standalone module)
### External
- `sqlite3`: Database connectivity
- `pathlib`: Path handling
- `dataclasses`: Data structure definitions
- `typing`: Type annotations
- `logging`: Debug and error logging
## Performance Characteristics
- **Batch sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage
- **SQLite optimizations**: Read-only, immutable mode, large mmap and cache sizes
- **Memory efficient**: Streaming iterator pattern prevents loading entire dataset
- **Temporal windowing**: One-row lookahead for precise time boundary calculation
## Testing
Run module tests:
```bash
uv run pytest test_db_interpreter.py -v
```
Test coverage includes:
- Batch reading correctness
- Temporal window boundary handling
- Trade-to-window assignment accuracy
- End-of-stream behavior
- Error handling for malformed data
## Known Issues
- Requires specific database schema (book and trades tables)
- Python-literal string parsing assumes well-formed input
- Large databases may require memory monitoring during streaming
## Configuration
- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048)
- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096)
- SQLite PRAGMA settings optimized for read-only sequential access

View File

@@ -0,0 +1,162 @@
# External Dependencies
## Overview
This document describes all external dependencies used in the orderflow backtest system, their purposes, versions, and justifications for inclusion.
## Production Dependencies
### Core Framework Dependencies
#### Dash (^2.18.2)
- **Purpose**: Web application framework for interactive visualizations
- **Usage**: Real-time chart rendering and user interface
- **Justification**: Mature Python-based framework with excellent Plotly integration
- **Key Features**: Reactive components, built-in server, callback system
#### Dash Bootstrap Components (^1.6.0)
- **Purpose**: Bootstrap CSS framework integration for Dash
- **Usage**: Responsive layout grid and modern UI styling
- **Justification**: Provides professional appearance with minimal custom CSS
#### Plotly (^5.24.1)
- **Purpose**: Interactive charting and visualization library
- **Usage**: OHLC candlesticks, volume bars, depth charts, OBI metrics
- **Justification**: Industry standard for financial data visualization
- **Key Features**: WebGL acceleration, zooming/panning, dark themes
### Data Processing Dependencies
#### Pandas (^2.2.3)
- **Purpose**: Data manipulation and analysis library
- **Usage**: Minimal usage for data structure conversions in visualization
- **Justification**: Standard tool for financial data handling
- **Note**: Usage kept minimal to maintain performance
#### Typer (^0.13.1)
- **Purpose**: Modern CLI framework
- **Usage**: Command-line argument parsing and help generation
- **Justification**: Type-safe, auto-generated help, better UX than argparse
- **Key Features**: Type hints integration, automatic validation
### Data Storage Dependencies
#### SQLite3 (Built-in)
- **Purpose**: Database connectivity for historical data
- **Usage**: Read-only access to orderbook and trade data
- **Justification**: Built into Python, no external dependencies, excellent performance
- **Configuration**: Optimized with immutable mode and mmap
## Development and Testing Dependencies
#### Pytest (^8.3.4)
- **Purpose**: Testing framework
- **Usage**: Unit tests, integration tests, test discovery
- **Justification**: Standard Python testing tool with excellent plugin ecosystem
#### Coverage (^7.6.9)
- **Purpose**: Code coverage measurement
- **Usage**: Test coverage reporting and quality metrics
- **Justification**: Essential for maintaining code quality
## Build and Package Management
#### UV (Package Manager)
- **Purpose**: Fast Python package manager and task runner
- **Usage**: Dependency management, virtual environments, script execution
- **Justification**: Significantly faster than pip/poetry, better lock file format
- **Commands**: `uv sync`, `uv run`, `uv add`
## Python Standard Library Usage
### Core Libraries
- **sqlite3**: Database connectivity
- **json**: JSON serialization for IPC
- **pathlib**: Modern file path handling
- **subprocess**: Process management for visualization
- **logging**: Structured logging throughout application
- **datetime**: Date/time parsing and manipulation
- **dataclasses**: Structured data types
- **typing**: Type annotations and hints
- **tempfile**: Atomic file operations
- **ast**: Safe evaluation of Python literals
### Performance Libraries
- **itertools**: Efficient iteration patterns
- **functools**: Function decoration and caching
- **collections**: Specialized data structures
## Dependency Justifications
### Why Dash Over Alternatives?
- **vs. Streamlit**: Better real-time updates, more control over layout
- **vs. Flask + Custom JS**: Integrated Plotly support, faster development
- **vs. Jupyter**: Better for production deployment, process isolation
### Why SQLite Over Alternatives?
- **vs. PostgreSQL**: No server setup required, excellent read performance
- **vs. Parquet**: Better for time-series queries, built-in indexing
- **vs. CSV**: Proper data types, much faster queries, atomic transactions
### Why UV Over Poetry/Pip?
- **vs. Poetry**: Significantly faster dependency resolution and installation
- **vs. Pip**: Better dependency locking, integrated task runner
- **vs. Pipenv**: More active development, better performance
## Version Pinning Strategy
### Patch Version Pinning
- Core dependencies (Dash, Plotly) pinned to patch versions
- Prevents breaking changes while allowing security updates
### Range Pinning
- Development tools use caret (^) ranges for flexibility
- Testing tools can update more freely
### Lock File Management
- `uv.lock` ensures reproducible builds across environments
- Regular updates scheduled monthly for security patches
## Security Considerations
### Dependency Scanning
- Regular audit of dependencies for known vulnerabilities
- Automated updates for security patches
- Minimal dependency tree to reduce attack surface
### Data Isolation
- Read-only database access prevents data modification
- No external network connections required for core functionality
- All file operations contained within project directory
## Performance Impact
### Bundle Size
- Core runtime: ~50MB with all dependencies
- Dash frontend: Additional ~10MB for JavaScript assets
- SQLite: Zero overhead (built-in)
### Startup Time
- Cold start: ~2-3 seconds for full application
- UV virtual environment activation: ~100ms
- Database connection: ~50ms per file
### Memory Usage
- Base application: ~100MB
- Per 1000 OHLC bars: ~5MB additional
- Plotly charts: ~20MB for complex visualizations
## Maintenance Schedule
### Monthly
- Security update review and application
- Dependency version bump evaluation
### Quarterly
- Major version update consideration
- Performance impact assessment
- Alternative technology evaluation
### Annually
- Complete dependency audit
- Technology stack review
- Migration planning for deprecated packages

View File

@@ -0,0 +1,101 @@
# Module: level_parser
## Purpose
The `level_parser` module provides utilities for parsing and normalizing orderbook level data from various string formats. It handles JSON and Python literal representations, converting them into standardized numeric tuples for processing.
## Public Interface
### Functions
- `normalize_levels(levels: Any) -> List[List[float]]`: Parse levels into [[price, size], ...] format, filtering out zero/negative sizes
- `parse_levels_including_zeros(levels: Any) -> List[Tuple[float, float]]`: Parse levels preserving zero sizes for deletion operations
### Private Functions
- `_parse_string_to_list(levels: Any) -> List[Any]`: Core parsing logic trying JSON first, then literal_eval
- `_extract_price_size(item: Any) -> Tuple[Any, Any]`: Extract price/size from dict or list/tuple formats
## Usage Examples
```python
from level_parser import normalize_levels, parse_levels_including_zeros
# Parse standard levels (filters zeros)
levels = normalize_levels('[[50000.0, 1.5], [49999.0, 2.0]]')
# Returns: [[50000.0, 1.5], [49999.0, 2.0]]
# Parse with zero sizes preserved (for deletions)
updates = parse_levels_including_zeros('[[50000.0, 0.0], [49999.0, 1.5]]')
# Returns: [(50000.0, 0.0), (49999.0, 1.5)]
# Supports dict format
dict_levels = normalize_levels('[{"price": 50000.0, "size": 1.5}]')
# Returns: [[50000.0, 1.5]]
# Short key format
short_levels = normalize_levels('[{"p": 50000.0, "s": 1.5}]')
# Returns: [[50000.0, 1.5]]
```
## Dependencies
### External
- `json`: Primary parsing method for level data
- `ast.literal_eval`: Fallback parsing for Python literal formats
- `logging`: Debug logging for parsing issues
- `typing`: Type annotations
## Input Formats Supported
### JSON Array Format
```json
[[50000.0, 1.5], [49999.0, 2.0]]
```
### Dict Format (Full Keys)
```json
[{"price": 50000.0, "size": 1.5}, {"price": 49999.0, "size": 2.0}]
```
### Dict Format (Short Keys)
```json
[{"p": 50000.0, "s": 1.5}, {"p": 49999.0, "s": 2.0}]
```
### Python Literal Format
```python
"[(50000.0, 1.5), (49999.0, 2.0)]"
```
## Error Handling
- **Graceful Degradation**: Returns empty list on parse failures
- **Data Validation**: Filters out invalid price/size pairs
- **Type Safety**: Converts all values to float before processing
- **Debug Logging**: Logs warnings for malformed input without crashing
## Performance Characteristics
- **Fast Path**: JSON parsing prioritized for performance
- **Fallback Support**: ast.literal_eval as backup for edge cases
- **Memory Efficient**: Processes items iteratively, not loading entire dataset
- **Validation**: Minimal overhead with early filtering of invalid data
## Testing
```bash
uv run pytest test_level_parser.py -v
```
Test coverage includes:
- JSON format parsing accuracy
- Dict format (both key styles) parsing
- Python literal fallback parsing
- Zero size preservation vs filtering
- Error handling for malformed input
- Type conversion edge cases
## Known Limitations
- Assumes well-formed numeric data (price/size as numbers)
- Does not validate economic constraints (e.g., positive prices)
- Limited to list/dict input formats
- No support for streaming/incremental parsing

168
docs/modules/main.md Normal file
View File

@@ -0,0 +1,168 @@
# Module: main
## Purpose
The `main` module provides the command-line interface (CLI) orchestration for the orderflow backtest system. It handles database discovery, process management, and coordinates the streaming pipeline with the visualization frontend using Typer for argument parsing.
## Public Interface
### Functions
- `main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None`: Primary CLI entrypoint
- `discover_databases(instrument: str, start_date: str, end_date: str) -> list[Path]`: Find matching database files
- `launch_visualizer() -> subprocess.Popen | None`: Start Dash application in separate process
### CLI Arguments
- `instrument`: Trading pair identifier (e.g., "BTC-USDT")
- `start_date`: Start date in YYYY-MM-DD format (UTC)
- `end_date`: End date in YYYY-MM-DD format (UTC)
- `--window-seconds`: OHLC aggregation window size (default: 60)
## Usage Examples
### Command Line Usage
```bash
# Basic usage with default 60-second windows
uv run python main.py BTC-USDT 2025-01-01 2025-01-31
# Custom window size
uv run python main.py ETH-USDT 2025-02-01 2025-02-28 --window-seconds 30
# Single day processing
uv run python main.py SOL-USDT 2025-03-15 2025-03-15
```
### Programmatic Usage
```python
from main import main, discover_databases
# Run processing pipeline
main("BTC-USDT", "2025-01-01", "2025-01-31", window_seconds=120)
# Discover available databases
db_files = discover_databases("ETH-USDT", "2025-02-01", "2025-02-28")
print(f"Found {len(db_files)} database files")
```
## Dependencies
### Internal
- `db_interpreter.DBInterpreter`: Database streaming
- `ohlc_processor.OHLCProcessor`: Trade aggregation and orderbook processing
- `viz_io`: Data clearing functions
### External
- `typer`: CLI framework and argument parsing
- `subprocess`: Process management for visualization
- `pathlib`: File and directory operations
- `datetime`: Date parsing and validation
- `logging`: Operational logging
- `sys`: Exit code management
## Database Discovery Logic
### File Pattern Matching
```python
# Expected directory structure
../data/OKX/{instrument}/{date}/
# Example paths
../data/OKX/BTC-USDT/2025-01-01/trades.db
../data/OKX/ETH-USDT/2025-02-15/trades.db
```
### Discovery Algorithm
1. Parse start and end dates to datetime objects
2. Iterate through date range (inclusive)
3. Construct expected path for each date
4. Verify file existence and readability
5. Return sorted list of valid database paths
## Process Orchestration
### Visualization Process Management
```python
# Launch Dash app in separate process
viz_process = subprocess.Popen([
"uv", "run", "python", "app.py"
], cwd=project_root)
# Process management
try:
# Main processing loop
process_databases(db_files)
finally:
# Cleanup visualization process
if viz_process:
viz_process.terminate()
viz_process.wait(timeout=5)
```
### Data Processing Pipeline
1. **Initialize**: Clear existing data files
2. **Launch**: Start visualization process
3. **Stream**: Process each database sequentially
4. **Aggregate**: Generate OHLC bars and depth snapshots
5. **Cleanup**: Terminate visualization and finalize
## Error Handling
### Database Access Errors
- **File not found**: Log warning and skip missing databases
- **Permission denied**: Log error and exit with status code 1
- **Corruption**: Log error for specific database and continue with next
### Process Management Errors
- **Visualization startup failure**: Log error but continue processing
- **Process termination**: Graceful shutdown with timeout
- **Resource cleanup**: Ensure child processes are terminated
### Date Validation
- **Invalid format**: Clear error message with expected format
- **Invalid range**: End date must be >= start date
- **Future dates**: Warning for dates beyond data availability
## Performance Characteristics
- **Sequential processing**: Databases processed one at a time
- **Memory efficient**: Streaming approach prevents loading entire datasets
- **Process isolation**: Visualization runs independently
- **Resource cleanup**: Automatic process termination on exit
## Testing
Run module tests:
```bash
uv run pytest test_main.py -v
```
Test coverage includes:
- Database discovery logic
- Date parsing and validation
- Process management
- Error handling scenarios
- CLI argument validation
## Configuration
### Default Settings
- **Data directory**: `../data/OKX` (relative to project root)
- **Visualization command**: `uv run python app.py`
- **Window size**: 60 seconds
- **Process timeout**: 5 seconds for termination
### Environment Variables
- **DATA_PATH**: Override default data directory
- **VISUALIZATION_PORT**: Override Dash port (requires app.py modification)
## Known Issues
- Assumes specific directory structure under `../data/OKX`
- No validation of database schema compatibility
- Limited error recovery for process management
- No progress indication for large datasets
## Development Notes
- Uses Typer for modern CLI interface
- Subprocess management compatible with Unix/Windows
- Logging configured for both development and production use
- Exit codes follow Unix conventions (0=success, 1=error)

View File

@@ -1,302 +0,0 @@
# Module: Metrics Calculation System
## Purpose
The metrics calculation system provides high-performance computation of Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD) indicators for cryptocurrency trading analysis. It processes orderbook snapshots and trade data to generate financial metrics with per-snapshot granularity.
## Public Interface
### Classes
#### `Metric` (dataclass)
Represents calculated metrics for a single orderbook snapshot.
```python
@dataclass(slots=True)
class Metric:
snapshot_id: int # Reference to source snapshot
timestamp: int # Unix timestamp
obi: float # Order Book Imbalance [-1, 1]
cvd: float # Cumulative Volume Delta
best_bid: float | None # Best bid price
best_ask: float | None # Best ask price
```
#### `MetricCalculator` (static class)
Provides calculation methods for financial metrics.
```python
class MetricCalculator:
@staticmethod
def calculate_obi(snapshot: BookSnapshot) -> float
@staticmethod
def calculate_volume_delta(trades: List[Trade]) -> float
@staticmethod
def calculate_cvd(previous_cvd: float, volume_delta: float) -> float
@staticmethod
def get_best_bid_ask(snapshot: BookSnapshot) -> tuple[float | None, float | None]
```
### Functions
#### Order Book Imbalance (OBI) Calculation
```python
def calculate_obi(snapshot: BookSnapshot) -> float:
"""
Calculate Order Book Imbalance using the standard formula.
Formula: OBI = (Vb - Va) / (Vb + Va)
Where:
Vb = Total volume on bid side
Va = Total volume on ask side
Args:
snapshot: BookSnapshot containing bids and asks data
Returns:
float: OBI value between -1 and 1, or 0.0 if no volume
Example:
>>> snapshot = BookSnapshot(bids={50000.0: OrderbookLevel(...)}, ...)
>>> obi = MetricCalculator.calculate_obi(snapshot)
>>> print(f"OBI: {obi:.3f}")
OBI: 0.333
"""
```
#### Volume Delta Calculation
```python
def calculate_volume_delta(trades: List[Trade]) -> float:
"""
Calculate Volume Delta for a list of trades.
Volume Delta = Buy Volume - Sell Volume
- Buy trades (side = "buy"): positive contribution
- Sell trades (side = "sell"): negative contribution
Args:
trades: List of Trade objects for specific timestamp
Returns:
float: Net volume delta (positive = buy pressure, negative = sell pressure)
Example:
>>> trades = [
... Trade(side="buy", size=10.0, ...),
... Trade(side="sell", size=3.0, ...)
... ]
>>> vd = MetricCalculator.calculate_volume_delta(trades)
>>> print(f"Volume Delta: {vd}")
Volume Delta: 7.0
"""
```
#### Cumulative Volume Delta (CVD) Calculation
```python
def calculate_cvd(previous_cvd: float, volume_delta: float) -> float:
"""
Calculate Cumulative Volume Delta with incremental support.
Formula: CVD_t = CVD_{t-1} + Volume_Delta_t
Args:
previous_cvd: Previous CVD value (use 0.0 for reset)
volume_delta: Current volume delta to add
Returns:
float: New cumulative volume delta value
Example:
>>> cvd = 0.0 # Starting value
>>> cvd = MetricCalculator.calculate_cvd(cvd, 10.0) # First trade
>>> cvd = MetricCalculator.calculate_cvd(cvd, -5.0) # Second trade
>>> print(f"CVD: {cvd}")
CVD: 5.0
"""
```
## Usage Examples
### Basic OBI Calculation
```python
from models import MetricCalculator, BookSnapshot, OrderbookLevel
# Create sample orderbook snapshot
snapshot = BookSnapshot(
id=1,
timestamp=1640995200,
bids={
50000.0: OrderbookLevel(price=50000.0, size=10.0, liquidation_count=0, order_count=1),
49999.0: OrderbookLevel(price=49999.0, size=5.0, liquidation_count=0, order_count=1),
},
asks={
50001.0: OrderbookLevel(price=50001.0, size=3.0, liquidation_count=0, order_count=1),
50002.0: OrderbookLevel(price=50002.0, size=2.0, liquidation_count=0, order_count=1),
}
)
# Calculate OBI
obi = MetricCalculator.calculate_obi(snapshot)
print(f"OBI: {obi:.3f}") # Output: OBI: 0.500
# Explanation: (15 - 5) / (15 + 5) = 10/20 = 0.5
```
### CVD Calculation with Reset
```python
from models import MetricCalculator, Trade
# Simulate trading session
cvd = 0.0 # Reset CVD at session start
# Process trades for first timestamp
trades_t1 = [
Trade(id=1, trade_id=1.0, price=50000.0, size=8.0, side="buy", timestamp=1000),
Trade(id=2, trade_id=2.0, price=50001.0, size=3.0, side="sell", timestamp=1000),
]
vd_t1 = MetricCalculator.calculate_volume_delta(trades_t1) # 8.0 - 3.0 = 5.0
cvd = MetricCalculator.calculate_cvd(cvd, vd_t1) # 0.0 + 5.0 = 5.0
# Process trades for second timestamp
trades_t2 = [
Trade(id=3, trade_id=3.0, price=49999.0, size=2.0, side="buy", timestamp=1001),
Trade(id=4, trade_id=4.0, price=50000.0, size=7.0, side="sell", timestamp=1001),
]
vd_t2 = MetricCalculator.calculate_volume_delta(trades_t2) # 2.0 - 7.0 = -5.0
cvd = MetricCalculator.calculate_cvd(cvd, vd_t2) # 5.0 + (-5.0) = 0.0
print(f"Final CVD: {cvd}") # Output: Final CVD: 0.0
```
### Complete Metrics Processing
```python
from models import MetricCalculator, Metric
def process_snapshot_metrics(snapshot, trades, previous_cvd=0.0):
"""Process complete metrics for a single snapshot."""
# Calculate OBI
obi = MetricCalculator.calculate_obi(snapshot)
# Calculate volume delta and CVD
volume_delta = MetricCalculator.calculate_volume_delta(trades)
cvd = MetricCalculator.calculate_cvd(previous_cvd, volume_delta)
# Extract best bid/ask
best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
# Create metric record
metric = Metric(
snapshot_id=snapshot.id,
timestamp=snapshot.timestamp,
obi=obi,
cvd=cvd,
best_bid=best_bid,
best_ask=best_ask
)
return metric, cvd
# Usage in processing loop
current_cvd = 0.0
for snapshot, trades in snapshot_trade_pairs:
metric, current_cvd = process_snapshot_metrics(snapshot, trades, current_cvd)
# Store metric to database...
```
## Dependencies
### Internal
- `models.BookSnapshot`: Orderbook state data
- `models.Trade`: Individual trade execution data
- `models.OrderbookLevel`: Price level information
### External
- **Python Standard Library**: `typing` for type hints
- **No external packages required**
## Performance Characteristics
### Computational Complexity
- **OBI Calculation**: O(n) where n = number of price levels
- **Volume Delta**: O(m) where m = number of trades
- **CVD Calculation**: O(1) - simple addition
- **Best Bid/Ask**: O(n) for min/max operations
### Memory Usage
- **Static Methods**: No instance state, minimal memory overhead
- **Calculations**: Process data in-place without copying
- **Results**: Lightweight `Metric` objects with slots optimization
### Typical Performance
```python
# Benchmark results (approximate)
Snapshot with 50 price levels: ~0.1ms per OBI calculation
Timestamp with 20 trades: ~0.05ms per volume delta
CVD update: ~0.001ms per calculation
Complete metric processing: ~0.2ms per snapshot
```
## Error Handling
### Edge Cases Handled
```python
# Empty orderbook
empty_snapshot = BookSnapshot(bids={}, asks={})
obi = MetricCalculator.calculate_obi(empty_snapshot) # Returns 0.0
# No trades
empty_trades = []
vd = MetricCalculator.calculate_volume_delta(empty_trades) # Returns 0.0
# Zero volume scenario
zero_vol_snapshot = BookSnapshot(
bids={50000.0: OrderbookLevel(price=50000.0, size=0.0, ...)},
asks={50001.0: OrderbookLevel(price=50001.0, size=0.0, ...)}
)
obi = MetricCalculator.calculate_obi(zero_vol_snapshot) # Returns 0.0
```
### Validation
- **OBI Range**: Results automatically bounded to [-1, 1]
- **Division by Zero**: Handled gracefully with 0.0 return
- **Invalid Data**: Empty collections handled without errors
## Testing
### Test Coverage
- **Unit Tests**: `tests/test_metric_calculator.py`
- **Integration Tests**: Included in storage and strategy tests
- **Edge Cases**: Empty data, zero volume, boundary conditions
### Running Tests
```bash
# Run metric calculator tests specifically
uv run pytest tests/test_metric_calculator.py -v
# Run all tests with metrics
uv run pytest -k "metric" -v
# Performance tests
uv run pytest tests/test_metric_calculator.py::test_calculate_obi_performance
```
## Known Issues
### Current Limitations
- **Precision**: Floating-point arithmetic limitations for very small numbers
- **Scale**: No optimization for extremely large orderbooks (>10k levels)
- **Currency**: No multi-currency support (assumes single denomination)
### Planned Enhancements
- **Decimal Precision**: Consider `decimal.Decimal` for high-precision calculations
- **Vectorization**: NumPy integration for batch calculations
- **Additional Metrics**: Volume Profile, Liquidity metrics, Delta Flow
---
The metrics calculation system provides a robust foundation for financial analysis with clean interfaces, comprehensive error handling, and optimal performance for high-frequency trading data.

View File

@@ -0,0 +1,147 @@
# Module: metrics_calculator
## Purpose
The `metrics_calculator` module handles calculation and management of trading metrics including Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD). It provides windowed aggregation with throttled updates for real-time visualization.
## Public Interface
### Classes
- `MetricsCalculator(window_seconds: int = 60, emit_every_n_updates: int = 25)`: Main metrics calculation engine
### Methods
- `update_cvd_from_trade(side: str, size: float) -> None`: Update CVD from individual trade data
- `update_obi_metrics(timestamp: str, total_bids: float, total_asks: float) -> None`: Update OBI metrics from orderbook volumes
- `finalize_metrics() -> None`: Emit final metrics bar at processing end
### Properties
- `cvd_cumulative: float`: Current cumulative volume delta value
### Private Methods
- `_emit_metrics_bar() -> None`: Emit current metrics to visualization layer
## Usage Examples
```python
from metrics_calculator import MetricsCalculator
# Initialize calculator
calc = MetricsCalculator(window_seconds=60, emit_every_n_updates=25)
# Update CVD from trades
calc.update_cvd_from_trade("buy", 1.5) # +1.5 CVD
calc.update_cvd_from_trade("sell", 1.0) # -1.0 CVD, net +0.5
# Update OBI from orderbook
total_bids, total_asks = 150.0, 120.0
calc.update_obi_metrics("1640995200000", total_bids, total_asks)
# Access current CVD
current_cvd = calc.cvd_cumulative # 0.5
# Finalize at end of processing
calc.finalize_metrics()
```
## Metrics Definitions
### Cumulative Volume Delta (CVD)
- **Formula**: CVD = Σ(buy_volume - sell_volume)
- **Interpretation**: Positive = more buying pressure, Negative = more selling pressure
- **Accumulation**: Running total across all processed trades
- **Update Frequency**: Every trade
### Order Book Imbalance (OBI)
- **Formula**: OBI = total_bid_volume - total_ask_volume
- **Interpretation**: Positive = more bid liquidity, Negative = more ask liquidity
- **Aggregation**: OHLC-style bars per time window (open, high, low, close)
- **Update Frequency**: Throttled per orderbook update
## Dependencies
### Internal
- `viz_io.upsert_metric_bar`: Output interface for visualization
### External
- `logging`: Warning messages for unknown trade sides
- `typing`: Type annotations
## Windowed Aggregation
### OBI Windows
- **Window Size**: Configurable via `window_seconds` (default: 60)
- **Window Alignment**: Aligned to epoch time boundaries
- **OHLC Tracking**: Maintains open, high, low, close values per window
- **Rollover**: Automatic window transitions with final bar emission
### Throttling Mechanism
- **Purpose**: Reduce I/O overhead during high-frequency updates
- **Trigger**: Every N updates (configurable via `emit_every_n_updates`)
- **Behavior**: Emits intermediate updates for real-time visualization
- **Final Emission**: Guaranteed on window rollover and finalization
## State Management
### CVD State
- `cvd_cumulative: float`: Running total across all trades
- **Persistence**: Maintained throughout processor lifetime
- **Updates**: Incremental addition/subtraction per trade
### OBI State
- `metrics_window_start: int`: Current window start timestamp
- `metrics_bar: dict`: Current OBI OHLC values
- `_metrics_since_last_emit: int`: Throttling counter
## Output Format
### Metrics Bar Structure
```python
{
'obi_open': float, # First OBI value in window
'obi_high': float, # Maximum OBI in window
'obi_low': float, # Minimum OBI in window
'obi_close': float, # Latest OBI value
}
```
### Visualization Integration
- Emitted via `viz_io.upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close, cvd_value)`
- Compatible with existing OHLC visualization infrastructure
- Real-time updates during active processing
## Performance Characteristics
- **Low Memory**: Maintains only current window state
- **Throttled I/O**: Configurable update frequency prevents excessive writes
- **Efficient Updates**: O(1) operations for trade and OBI updates
- **Window Management**: Automatic transitions without manual intervention
## Configuration
### Constructor Parameters
- `window_seconds: int`: Time window for OBI aggregation (default: 60)
- `emit_every_n_updates: int`: Throttling factor for intermediate updates (default: 25)
### Tuning Guidelines
- **Higher throttling**: Reduces I/O load, delays real-time updates
- **Lower throttling**: More responsive visualization, higher I/O overhead
- **Window size**: Affects granularity of OBI trends (shorter = more detail)
## Testing
```bash
uv run pytest test_metrics_calculator.py -v
```
Test coverage includes:
- CVD accumulation accuracy across multiple trades
- OBI window rollover and OHLC tracking
- Throttling behavior verification
- Edge cases (unknown trade sides, empty windows)
- Integration with visualization output
## Known Limitations
- CVD calculation assumes binary buy/sell classification
- No support for partial fills or complex order types
- OBI calculation treats all liquidity equally (no price weighting)
- Window boundaries aligned to absolute timestamps (no sliding windows)

View File

@@ -0,0 +1,122 @@
# Module: ohlc_processor
## Purpose
The `ohlc_processor` module serves as the main coordinator for trade data processing, orchestrating OHLC aggregation, orderbook management, and metrics calculation. It has been refactored into a modular architecture using composition with specialized helper modules.
## Public Interface
### Classes
- `OHLCProcessor(window_seconds: int = 60, depth_levels_per_side: int = 50)`: Main orchestrator class that coordinates trade processing using composition
### Methods
- `process_trades(trades: list[tuple]) -> None`: Aggregate trades into OHLC bars and update CVD metrics
- `update_orderbook(ob_update: OrderbookUpdate) -> None`: Apply orderbook updates and calculate OBI metrics
- `finalize() -> None`: Emit final OHLC bar and metrics data
- `cvd_cumulative` (property): Access to cumulative volume delta value
### Composed Modules
- `OrderbookManager`: Handles in-memory orderbook state and depth snapshots
- `MetricsCalculator`: Manages OBI and CVD metric calculations
- `level_parser` functions: Parse and normalize orderbook level data
## Usage Examples
```python
from ohlc_processor import OHLCProcessor
from db_interpreter import DBInterpreter
# Initialize processor with 1-minute windows and 50 depth levels
processor = OHLCProcessor(window_seconds=60, depth_levels_per_side=50)
# Process streaming data
for ob_update, trades in DBInterpreter(db_path).stream():
# Aggregate trades into OHLC bars
processor.process_trades(trades)
# Update orderbook and emit depth snapshots
processor.update_orderbook(ob_update)
# Finalize processing
processor.finalize()
```
### Advanced Configuration
```python
# Custom window size and depth levels
processor = OHLCProcessor(
window_seconds=30, # 30-second bars
depth_levels_per_side=25 # Top 25 levels per side
)
```
## Dependencies
### Internal Modules
- `orderbook_manager.OrderbookManager`: In-memory orderbook state management
- `metrics_calculator.MetricsCalculator`: OBI and CVD metrics calculation
- `level_parser`: Orderbook level parsing utilities
- `viz_io`: JSON output for visualization
- `db_interpreter.OrderbookUpdate`: Input data structures
### External
- `typing`: Type annotations
- `logging`: Debug and operational logging
## Modular Architecture
The processor now follows a clean composition pattern:
1. **Main Coordinator** (`OHLCProcessor`):
- Orchestrates trade and orderbook processing
- Maintains OHLC bar state and window management
- Delegates specialized tasks to composed modules
2. **Orderbook Management** (`OrderbookManager`):
- Maintains in-memory price→size mappings
- Applies partial updates and handles deletions
- Provides sorted top-N level extraction
3. **Metrics Calculation** (`MetricsCalculator`):
- Tracks CVD from trade flow (buy/sell volume delta)
- Calculates OBI from orderbook volume imbalance
- Manages windowed metrics aggregation with throttling
4. **Level Parsing** (`level_parser` module):
- Normalizes JSON and Python literal level representations
- Handles zero-size levels for orderbook deletions
- Provides robust error handling for malformed data
## Performance Characteristics
- **Throttled Updates**: Prevents excessive I/O during high-frequency periods
- **Memory Efficient**: Maintains only current window and top-N depth levels
- **Incremental Processing**: Applies only changed orderbook levels
- **Atomic Operations**: Thread-safe updates to shared data structures
## Testing
Run module tests:
```bash
uv run pytest test_ohlc_processor.py -v
```
Test coverage includes:
- OHLC calculation accuracy across window boundaries
- Volume accumulation correctness
- High/low price tracking
- Orderbook update application
- Depth snapshot generation
- OBI metric calculation
## Known Issues
- Orderbook level parsing assumes well-formed JSON or Python literals
- Memory usage scales with number of active price levels
- Clock skew between trades and orderbook updates not handled
## Configuration Options
- `window_seconds`: Time window size for OHLC aggregation (default: 60)
- `depth_levels_per_side`: Number of top price levels to maintain (default: 50)
- `UPSERT_THROTTLE_MS`: Minimum interval between upsert operations (internal)
- `DEPTH_EMIT_THROTTLE_MS`: Minimum interval between depth emissions (internal)

View File

@@ -0,0 +1,121 @@
# Module: orderbook_manager
## Purpose
The `orderbook_manager` module provides in-memory orderbook state management with partial update capabilities. It maintains separate bid and ask sides and supports efficient top-level extraction for visualization.
## Public Interface
### Classes
- `OrderbookManager(depth_levels_per_side: int = 50)`: Main orderbook state manager
### Methods
- `apply_updates(bids_updates: List[Tuple[float, float]], asks_updates: List[Tuple[float, float]]) -> None`: Apply partial updates to both sides
- `get_total_volume() -> Tuple[float, float]`: Get total bid and ask volumes
- `get_top_levels() -> Tuple[List[List[float]], List[List[float]]]`: Get sorted top levels for both sides
### Private Methods
- `_apply_partial_updates(side_map: Dict[float, float], updates: List[Tuple[float, float]]) -> None`: Apply updates to one side
- `_build_top_levels(side_map: Dict[float, float], limit: int, reverse: bool) -> List[List[float]]`: Extract sorted top levels
## Usage Examples
```python
from orderbook_manager import OrderbookManager
# Initialize manager
manager = OrderbookManager(depth_levels_per_side=25)
# Apply orderbook updates
bids = [(50000.0, 1.5), (49999.0, 2.0)]
asks = [(50001.0, 1.2), (50002.0, 0.8)]
manager.apply_updates(bids, asks)
# Get volume totals for OBI calculation
total_bids, total_asks = manager.get_total_volume()
obi = total_bids - total_asks
# Get top levels for depth visualization
bids_sorted, asks_sorted = manager.get_top_levels()
# Handle deletions (size = 0)
deletions = [(50000.0, 0.0)] # Remove price level
manager.apply_updates(deletions, [])
```
## Dependencies
### External
- `typing`: Type annotations for Dict, List, Tuple
## State Management
### Internal State
- `_book_bids: Dict[float, float]`: Price → size mapping for bid side
- `_book_asks: Dict[float, float]`: Price → size mapping for ask side
- `depth_levels_per_side: int`: Configuration for top-N extraction
### Update Semantics
- **Size = 0**: Remove price level (deletion)
- **Size > 0**: Upsert price level with new size
- **Size < 0**: Ignored (invalid update)
### Sorting Behavior
- **Bids**: Descending by price (highest price first)
- **Asks**: Ascending by price (lowest price first)
- **Top-N**: Limited by `depth_levels_per_side` parameter
## Performance Characteristics
- **Memory Efficient**: Only stores non-zero price levels
- **Fast Updates**: O(1) upsert/delete operations using dict
- **Efficient Sorting**: Only sorts when extracting top levels
- **Bounded Output**: Limits result size for visualization performance
## Use Cases
### OBI Calculation
```python
total_bids, total_asks = manager.get_total_volume()
order_book_imbalance = total_bids - total_asks
```
### Depth Visualization
```python
bids, asks = manager.get_top_levels()
depth_payload = {"bids": bids, "asks": asks}
```
### Incremental Updates
```python
# Typical orderbook update cycle
updates = parse_orderbook_changes(raw_data)
manager.apply_updates(updates['bids'], updates['asks'])
```
## Testing
```bash
uv run pytest test_orderbook_manager.py -v
```
Test coverage includes:
- Partial update application correctness
- Deletion handling (size = 0)
- Volume calculation accuracy
- Top-level sorting and limiting
- Edge cases (empty books, single levels)
- Performance with large orderbooks
## Configuration
- `depth_levels_per_side`: Controls output size for visualization (default: 50)
- Affects memory usage and sorting performance
- Higher values provide more market depth detail
- Lower values improve processing speed
## Known Limitations
- No built-in validation of price/size values
- Memory usage scales with number of unique price levels
- No historical state tracking (current snapshot only)
- No support for spread calculation or market data statistics

155
docs/modules/viz_io.md Normal file
View File

@@ -0,0 +1,155 @@
# Module: viz_io
## Purpose
The `viz_io` module provides atomic inter-process communication (IPC) between the data processing pipeline and the visualization frontend. It manages JSON file-based data exchange with atomic writes to prevent race conditions and data corruption.
## Public Interface
### Functions
- `add_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Append new OHLC bar to rolling dataset
- `upsert_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Update existing bar or append new one
- `clear_data()`: Reset OHLC dataset to empty state
- `add_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Append OBI metric bar
- `upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Update existing OBI bar or append new one
- `clear_metrics()`: Reset metrics dataset to empty state
- `set_depth_data(bids, asks)`: Update current orderbook depth snapshot
### Constants
- `DATA_FILE`: Path to OHLC data JSON file
- `DEPTH_FILE`: Path to depth data JSON file
- `METRICS_FILE`: Path to metrics data JSON file
- `MAX_BARS`: Maximum number of bars to retain (1000)
## Usage Examples
### Basic OHLC Operations
```python
import viz_io
# Add a new OHLC bar
viz_io.add_ohlc_bar(
timestamp=1640995200000, # Unix timestamp in milliseconds
open_price=50000.0,
high_price=50100.0,
low_price=49900.0,
close_price=50050.0,
volume=125.5
)
# Update the current bar (if timestamp matches) or add new one
viz_io.upsert_ohlc_bar(
timestamp=1640995200000,
open_price=50000.0,
high_price=50150.0, # Updated high
low_price=49850.0, # Updated low
close_price=50075.0, # Updated close
volume=130.2 # Updated volume
)
```
### Orderbook Depth Management
```python
# Set current depth snapshot
bids = [[49990.0, 1.5], [49985.0, 2.1], [49980.0, 0.8]]
asks = [[50010.0, 1.2], [50015.0, 1.8], [50020.0, 2.5]]
viz_io.set_depth_data(bids, asks)
```
### Metrics Operations
```python
# Add Order Book Imbalance metrics
viz_io.add_metric_bar(
timestamp=1640995200000,
obi_open=0.15,
obi_high=0.22,
obi_low=0.08,
obi_close=0.18
)
```
## Dependencies
### Internal
- None (standalone utility module)
### External
- `json`: JSON serialization/deserialization
- `pathlib`: File path handling
- `typing`: Type annotations
- `tempfile`: Atomic write operations
## Data Formats
### OHLC Data (`ohlc_data.json`)
```json
[
[1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
[1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
]
```
Format: `[timestamp, open, high, low, close, volume]`
### Depth Data (`depth_data.json`)
```json
{
"bids": [[49990.0, 1.5], [49985.0, 2.1]],
"asks": [[50010.0, 1.2], [50015.0, 1.8]]
}
```
Format: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`
### Metrics Data (`metrics_data.json`)
```json
[
[1640995200000, 0.15, 0.22, 0.08, 0.18],
[1640995260000, 0.18, 0.25, 0.12, 0.20]
]
```
Format: `[timestamp, obi_open, obi_high, obi_low, obi_close]`
## Atomic Write Operations
All write operations use atomic file replacement to prevent partial reads:
1. Write data to temporary file
2. Flush and sync to disk
3. Atomically rename temporary file to target file
This ensures the visualization frontend always reads complete, valid JSON data.
## Performance Characteristics
- **Bounded Memory**: OHLC and metrics datasets limited to 1000 bars max
- **Atomic Operations**: No partial reads possible during writes
- **Rolling Window**: Automatic trimming of old data maintains constant memory usage
- **Fast Lookups**: Timestamp-based upsert operations use list scanning (acceptable for 1000 items)
## Testing
Run module tests:
```bash
uv run pytest test_viz_io.py -v
```
Test coverage includes:
- Atomic write operations
- Data format validation
- Rolling window behavior
- Upsert logic correctness
- File corruption prevention
- Concurrent read/write scenarios
## Known Issues
- File I/O may block briefly during atomic writes
- JSON parsing errors not propagated to callers
- Limited to 1000 bars maximum (configurable via MAX_BARS)
- No compression for large datasets
## Thread Safety
All operations are thread-safe for single writer, multiple reader scenarios:
- Writer: Data processing pipeline (single thread)
- Readers: Visualization frontend (polling)
- Atomic file operations prevent corruption during concurrent access