139 lines
6.9 KiB
Markdown
139 lines
6.9 KiB
Markdown
|
|
# PRD: OBI and CVD Metrics Integration
|
||
|
|
|
||
|
|
## Introduction/Overview
|
||
|
|
|
||
|
|
This feature integrates Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD) calculations into the orderflow backtest system. Currently, the system stores all snapshots in memory during processing, which consumes excessive memory. The goal is to compute OBI and CVD metrics during the `build_booktick_from_db` execution, store these metrics persistently in the database, and visualize them alongside OHLC and volume data.
|
||
|
|
|
||
|
|
## Goals
|
||
|
|
|
||
|
|
1. **Memory Optimization**: Reduce memory usage by storing only essential data (OBI/CVD metrics, best bid/ask) instead of full snapshot history
|
||
|
|
2. **Metric Calculation**: Implement per-snapshot OBI and CVD calculations with maximum granularity
|
||
|
|
3. **Persistent Storage**: Store calculated metrics in the database to avoid recalculation
|
||
|
|
4. **Enhanced Visualization**: Display OBI and CVD curves beneath volume graphs with shared time axis
|
||
|
|
5. **Incremental CVD**: Support incremental CVD calculation that can be reset at user-defined points
|
||
|
|
|
||
|
|
## User Stories
|
||
|
|
|
||
|
|
1. **As a trader**, I want to see OBI and CVD metrics calculated for each orderbook snapshot so that I can analyze market sentiment with maximum granularity.
|
||
|
|
|
||
|
|
2. **As a system user**, I want metrics to be stored persistently in the database so that I don't need to recalculate them when re-analyzing the same dataset.
|
||
|
|
|
||
|
|
3. **As a data analyst**, I want to visualize OBI and CVD curves alongside OHLC and volume data so that I can correlate price movements with orderbook imbalances and volume deltas.
|
||
|
|
|
||
|
|
4. **As a performance-conscious user**, I want the system to use less memory during processing so that I can analyze larger datasets (months to years of data).
|
||
|
|
|
||
|
|
5. **As a researcher**, I want incremental CVD calculation so that I can track cumulative volume changes from any chosen starting point in my analysis.
|
||
|
|
|
||
|
|
## Functional Requirements
|
||
|
|
|
||
|
|
### Database Schema Updates
|
||
|
|
1. **Create metrics table** with the following structure:
|
||
|
|
```sql
|
||
|
|
CREATE TABLE metrics (
|
||
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
|
|
snapshot_id INTEGER,
|
||
|
|
timestamp TEXT,
|
||
|
|
obi REAL,
|
||
|
|
cvd REAL,
|
||
|
|
best_bid REAL,
|
||
|
|
best_ask REAL,
|
||
|
|
FOREIGN KEY (snapshot_id) REFERENCES book(id)
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
### OBI Calculation
|
||
|
|
2. **Calculate OBI per snapshot** using the formula: `OBI = (Vb - Va) / (Vb + Va)` where:
|
||
|
|
- Vb = total volume on bid side
|
||
|
|
- Va = total volume on ask side
|
||
|
|
3. **Handle edge cases** where Vb + Va = 0 by setting OBI = 0.0
|
||
|
|
4. **Store OBI values** in the metrics table for each processed snapshot
|
||
|
|
|
||
|
|
### CVD Calculation
|
||
|
|
5. **Calculate Volume Delta per timestamp** by summing all trades at each snapshot timestamp:
|
||
|
|
- Buy trades (side = "buy"): add to positive volume
|
||
|
|
- Sell trades (side = "sell"): add to negative volume
|
||
|
|
- VD = Buy Volume - Sell Volume
|
||
|
|
6. **Calculate Cumulative Volume Delta** as running sum: `CVD_t = CVD_{t-1} + VD_t`
|
||
|
|
7. **Support CVD reset functionality** to allow starting cumulative calculation from any point
|
||
|
|
8. **Handle snapshots with no trades** by carrying forward the previous CVD value
|
||
|
|
|
||
|
|
### Storage System Updates
|
||
|
|
9. **Modify Storage class** to integrate metric calculations during `build_booktick_from_db`
|
||
|
|
10. **Update Book model** to store only essential data: OBI/CVD time series and best bid/ask levels
|
||
|
|
11. **Remove full snapshot retention** from memory after metric calculation
|
||
|
|
12. **Add metric persistence** to SQLite database during processing
|
||
|
|
|
||
|
|
### Strategy Integration
|
||
|
|
13. **Enhance DefaultStrategy** to calculate both OBI and CVD metrics
|
||
|
|
14. **Return time-series data structures** compatible with visualization system
|
||
|
|
15. **Integrate metric calculation** into the existing `on_booktick` workflow
|
||
|
|
|
||
|
|
### Visualization Enhancements
|
||
|
|
16. **Add OBI and CVD plotting** to the visualizer beneath volume graphs
|
||
|
|
17. **Implement shared X-axis** for time alignment across OHLC, volume, OBI, and CVD charts
|
||
|
|
18. **Support 6-hour bar aggregation** as the initial time resolution
|
||
|
|
19. **Use standard line styling** for OBI and CVD curves
|
||
|
|
20. **Make time resolution configurable** for future flexibility
|
||
|
|
|
||
|
|
## Non-Goals (Out of Scope)
|
||
|
|
|
||
|
|
1. **Real-time streaming** - This feature focuses on historical data processing
|
||
|
|
2. **Advanced visualization features** - Complex styling, indicators, or interactive elements beyond basic line charts
|
||
|
|
3. **Alternative CVD calculation methods** - Only implementing the standard buy/sell volume delta approach
|
||
|
|
4. **Multi-threading optimization** - Simple sequential processing for initial implementation
|
||
|
|
5. **Data compression** - No advanced compression techniques for stored metrics
|
||
|
|
6. **Export functionality** - No CSV/JSON export of calculated metrics
|
||
|
|
|
||
|
|
## Technical Considerations
|
||
|
|
|
||
|
|
### Database Performance
|
||
|
|
- Add indexes on `metrics.timestamp` and `metrics.snapshot_id` for efficient querying
|
||
|
|
- Consider batch insertions for metric data to improve write performance
|
||
|
|
|
||
|
|
### Memory Management
|
||
|
|
- Process snapshots sequentially and discard after metric calculation
|
||
|
|
- Maintain only calculated time-series data in memory
|
||
|
|
- Keep best bid/ask data for potential future analysis needs
|
||
|
|
|
||
|
|
### Data Integrity
|
||
|
|
- Ensure metric calculations are atomic with snapshot processing
|
||
|
|
- Add foreign key constraints to maintain referential integrity
|
||
|
|
- Implement transaction boundaries for consistent data state
|
||
|
|
|
||
|
|
### Integration Points
|
||
|
|
- Modify `SQLiteOrderflowRepository` to support metrics table operations
|
||
|
|
- Update `Storage._create_snapshots_from_rows` to include metric calculation
|
||
|
|
- Extend `Visualizer` to handle additional metric data series
|
||
|
|
|
||
|
|
## Success Metrics
|
||
|
|
|
||
|
|
1. **Memory Usage Reduction**: Achieve at least 70% reduction in peak memory usage during processing
|
||
|
|
2. **Processing Speed**: Maintain or improve current processing speed (rows/sec) despite additional calculations
|
||
|
|
3. **Data Accuracy**: 100% correlation between manually calculated and stored OBI/CVD values for test datasets
|
||
|
|
4. **Visualization Quality**: Successfully display OBI and CVD curves with proper time alignment
|
||
|
|
5. **Storage Efficiency**: Metrics table size should be manageable relative to source data (< 20% overhead)
|
||
|
|
|
||
|
|
## Open Questions
|
||
|
|
|
||
|
|
1. **Index Strategy**: Should we add additional database indexes for time-range queries on metrics?
|
||
|
|
2. **CVD Starting Value**: Should CVD start from 0 for each database file, or allow continuation from previous sessions?
|
||
|
|
3. **Error Recovery**: How should the system handle partial metric calculations if processing is interrupted?
|
||
|
|
4. **Validation**: Do we need validation checks to ensure OBI values stay within [-1, 1] range?
|
||
|
|
5. **Performance Monitoring**: Should we add timing metrics to track calculation performance per snapshot?
|
||
|
|
|
||
|
|
## Implementation Priority
|
||
|
|
|
||
|
|
**Phase 1: Core Functionality**
|
||
|
|
- Database schema updates
|
||
|
|
- Basic OBI/CVD calculation
|
||
|
|
- Metric storage integration
|
||
|
|
|
||
|
|
**Phase 2: Memory Optimization**
|
||
|
|
- Remove full snapshot retention
|
||
|
|
- Implement essential data-only storage
|
||
|
|
|
||
|
|
**Phase 3: Visualization**
|
||
|
|
- Add metric plotting to visualizer
|
||
|
|
- Implement time axis alignment
|
||
|
|
- Support 6-hour bar aggregation
|