6.9 KiB
PRD: OBI and CVD Metrics Integration
Introduction/Overview
This feature integrates Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD) calculations into the orderflow backtest system. Currently, the system stores all snapshots in memory during processing, which consumes excessive memory. The goal is to compute OBI and CVD metrics during the build_booktick_from_db execution, store these metrics persistently in the database, and visualize them alongside OHLC and volume data.
Goals
- Memory Optimization: Reduce memory usage by storing only essential data (OBI/CVD metrics, best bid/ask) instead of full snapshot history
- Metric Calculation: Implement per-snapshot OBI and CVD calculations with maximum granularity
- Persistent Storage: Store calculated metrics in the database to avoid recalculation
- Enhanced Visualization: Display OBI and CVD curves beneath volume graphs with shared time axis
- Incremental CVD: Support incremental CVD calculation that can be reset at user-defined points
User Stories
-
As a trader, I want to see OBI and CVD metrics calculated for each orderbook snapshot so that I can analyze market sentiment with maximum granularity.
-
As a system user, I want metrics to be stored persistently in the database so that I don't need to recalculate them when re-analyzing the same dataset.
-
As a data analyst, I want to visualize OBI and CVD curves alongside OHLC and volume data so that I can correlate price movements with orderbook imbalances and volume deltas.
-
As a performance-conscious user, I want the system to use less memory during processing so that I can analyze larger datasets (months to years of data).
-
As a researcher, I want incremental CVD calculation so that I can track cumulative volume changes from any chosen starting point in my analysis.
Functional Requirements
Database Schema Updates
- Create metrics table with the following structure:
CREATE TABLE metrics ( id INTEGER PRIMARY KEY AUTOINCREMENT, snapshot_id INTEGER, timestamp TEXT, obi REAL, cvd REAL, best_bid REAL, best_ask REAL, FOREIGN KEY (snapshot_id) REFERENCES book(id) );
OBI Calculation
- Calculate OBI per snapshot using the formula:
OBI = (Vb - Va) / (Vb + Va)where:- Vb = total volume on bid side
- Va = total volume on ask side
- Handle edge cases where Vb + Va = 0 by setting OBI = 0.0
- Store OBI values in the metrics table for each processed snapshot
CVD Calculation
- Calculate Volume Delta per timestamp by summing all trades at each snapshot timestamp:
- Buy trades (side = "buy"): add to positive volume
- Sell trades (side = "sell"): add to negative volume
- VD = Buy Volume - Sell Volume
- Calculate Cumulative Volume Delta as running sum:
CVD_t = CVD_{t-1} + VD_t - Support CVD reset functionality to allow starting cumulative calculation from any point
- Handle snapshots with no trades by carrying forward the previous CVD value
Storage System Updates
- Modify Storage class to integrate metric calculations during
build_booktick_from_db - Update Book model to store only essential data: OBI/CVD time series and best bid/ask levels
- Remove full snapshot retention from memory after metric calculation
- Add metric persistence to SQLite database during processing
Strategy Integration
- Enhance DefaultStrategy to calculate both OBI and CVD metrics
- Return time-series data structures compatible with visualization system
- Integrate metric calculation into the existing
on_booktickworkflow
Visualization Enhancements
- Add OBI and CVD plotting to the visualizer beneath volume graphs
- Implement shared X-axis for time alignment across OHLC, volume, OBI, and CVD charts
- Support 6-hour bar aggregation as the initial time resolution
- Use standard line styling for OBI and CVD curves
- Make time resolution configurable for future flexibility
Non-Goals (Out of Scope)
- Real-time streaming - This feature focuses on historical data processing
- Advanced visualization features - Complex styling, indicators, or interactive elements beyond basic line charts
- Alternative CVD calculation methods - Only implementing the standard buy/sell volume delta approach
- Multi-threading optimization - Simple sequential processing for initial implementation
- Data compression - No advanced compression techniques for stored metrics
- Export functionality - No CSV/JSON export of calculated metrics
Technical Considerations
Database Performance
- Add indexes on
metrics.timestampandmetrics.snapshot_idfor efficient querying - Consider batch insertions for metric data to improve write performance
Memory Management
- Process snapshots sequentially and discard after metric calculation
- Maintain only calculated time-series data in memory
- Keep best bid/ask data for potential future analysis needs
Data Integrity
- Ensure metric calculations are atomic with snapshot processing
- Add foreign key constraints to maintain referential integrity
- Implement transaction boundaries for consistent data state
Integration Points
- Modify
SQLiteOrderflowRepositoryto support metrics table operations - Update
Storage._create_snapshots_from_rowsto include metric calculation - Extend
Visualizerto handle additional metric data series
Success Metrics
- Memory Usage Reduction: Achieve at least 70% reduction in peak memory usage during processing
- Processing Speed: Maintain or improve current processing speed (rows/sec) despite additional calculations
- Data Accuracy: 100% correlation between manually calculated and stored OBI/CVD values for test datasets
- Visualization Quality: Successfully display OBI and CVD curves with proper time alignment
- Storage Efficiency: Metrics table size should be manageable relative to source data (< 20% overhead)
Open Questions
- Index Strategy: Should we add additional database indexes for time-range queries on metrics?
- CVD Starting Value: Should CVD start from 0 for each database file, or allow continuation from previous sessions?
- Error Recovery: How should the system handle partial metric calculations if processing is interrupted?
- Validation: Do we need validation checks to ensure OBI values stay within [-1, 1] range?
- Performance Monitoring: Should we add timing metrics to track calculation performance per snapshot?
Implementation Priority
Phase 1: Core Functionality
- Database schema updates
- Basic OBI/CVD calculation
- Metric storage integration
Phase 2: Memory Optimization
- Remove full snapshot retention
- Implement essential data-only storage
Phase 3: Visualization
- Add metric plotting to visualizer
- Implement time axis alignment
- Support 6-hour bar aggregation