84 lines
2.7 KiB
Markdown
84 lines
2.7 KiB
Markdown
|
|
# Module: db_interpreter
|
||
|
|
|
||
|
|
## Purpose
|
||
|
|
The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.
|
||
|
|
|
||
|
|
## Public Interface
|
||
|
|
|
||
|
|
### Classes
|
||
|
|
- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook
|
||
|
|
- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp
|
||
|
|
|
||
|
|
### Functions
|
||
|
|
- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings
|
||
|
|
|
||
|
|
### Methods
|
||
|
|
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows
|
||
|
|
|
||
|
|
## Usage Examples
|
||
|
|
|
||
|
|
```python
|
||
|
|
from pathlib import Path
|
||
|
|
from db_interpreter import DBInterpreter
|
||
|
|
|
||
|
|
# Initialize interpreter
|
||
|
|
db_path = Path("data/BTC-USDT-2025-01-01.db")
|
||
|
|
interpreter = DBInterpreter(db_path)
|
||
|
|
|
||
|
|
# Stream orderbook and trade data
|
||
|
|
for ob_update, trades in interpreter.stream():
|
||
|
|
# Process orderbook update
|
||
|
|
print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")
|
||
|
|
print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")
|
||
|
|
|
||
|
|
# Process trades in this window
|
||
|
|
for trade in trades:
|
||
|
|
trade_id, price, size, side, timestamp_ms = trade[1:6]
|
||
|
|
print(f"Trade: {side} {size} @ {price}")
|
||
|
|
```
|
||
|
|
|
||
|
|
## Dependencies
|
||
|
|
|
||
|
|
### Internal
|
||
|
|
- None (standalone module)
|
||
|
|
|
||
|
|
### External
|
||
|
|
- `sqlite3`: Database connectivity
|
||
|
|
- `pathlib`: Path handling
|
||
|
|
- `dataclasses`: Data structure definitions
|
||
|
|
- `typing`: Type annotations
|
||
|
|
- `logging`: Debug and error logging
|
||
|
|
|
||
|
|
## Performance Characteristics
|
||
|
|
|
||
|
|
- **Batch sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage
|
||
|
|
- **SQLite optimizations**: Read-only, immutable mode, large mmap and cache sizes
|
||
|
|
- **Memory efficient**: Streaming iterator pattern prevents loading entire dataset
|
||
|
|
- **Temporal windowing**: One-row lookahead for precise time boundary calculation
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
Run module tests:
|
||
|
|
```bash
|
||
|
|
uv run pytest test_db_interpreter.py -v
|
||
|
|
```
|
||
|
|
|
||
|
|
Test coverage includes:
|
||
|
|
- Batch reading correctness
|
||
|
|
- Temporal window boundary handling
|
||
|
|
- Trade-to-window assignment accuracy
|
||
|
|
- End-of-stream behavior
|
||
|
|
- Error handling for malformed data
|
||
|
|
|
||
|
|
## Known Issues
|
||
|
|
|
||
|
|
- Requires specific database schema (book and trades tables)
|
||
|
|
- Python-literal string parsing assumes well-formed input
|
||
|
|
- Large databases may require memory monitoring during streaming
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048)
|
||
|
|
- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096)
|
||
|
|
- SQLite PRAGMA settings optimized for read-only sequential access
|