orderflow_backtest/docs/modules/db_interpreter.md

84 lines
2.7 KiB
Markdown
Raw Normal View History

2025-09-10 15:39:16 +08:00
# Module: db_interpreter
## Purpose
The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.
## Public Interface
### Classes
- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook
- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp
### Functions
- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings
### Methods
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows
## Usage Examples
```python
from pathlib import Path
from db_interpreter import DBInterpreter
# Initialize interpreter
db_path = Path("data/BTC-USDT-2025-01-01.db")
interpreter = DBInterpreter(db_path)
# Stream orderbook and trade data
for ob_update, trades in interpreter.stream():
# Process orderbook update
print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")
print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")
# Process trades in this window
for trade in trades:
trade_id, price, size, side, timestamp_ms = trade[1:6]
print(f"Trade: {side} {size} @ {price}")
```
## Dependencies
### Internal
- None (standalone module)
### External
- `sqlite3`: Database connectivity
- `pathlib`: Path handling
- `dataclasses`: Data structure definitions
- `typing`: Type annotations
- `logging`: Debug and error logging
## Performance Characteristics
- **Batch sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage
- **SQLite optimizations**: Read-only, immutable mode, large mmap and cache sizes
- **Memory efficient**: Streaming iterator pattern prevents loading entire dataset
- **Temporal windowing**: One-row lookahead for precise time boundary calculation
## Testing
Run module tests:
```bash
uv run pytest test_db_interpreter.py -v
```
Test coverage includes:
- Batch reading correctness
- Temporal window boundary handling
- Trade-to-window assignment accuracy
- End-of-stream behavior
- Error handling for malformed data
## Known Issues
- Requires specific database schema (book and trades tables)
- Python-literal string parsing assumes well-formed input
- Large databases may require memory monitoring during streaming
## Configuration
- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048)
- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096)
- SQLite PRAGMA settings optimized for read-only sequential access