orderflow_backtest/docs/modules/db_interpreter.md

# Module: db_interpreter

## Purpose
The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.

## Public Interface

### Classes
- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook
- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp

### Functions
- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings

### Methods
- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows

## Usage Examples

```python
from pathlib import Path
from db_interpreter import DBInterpreter

# Initialize interpreter
db_path = Path("data/BTC-USDT-2025-01-01.db")
interpreter = DBInterpreter(db_path)

# Stream orderbook and trade data
for ob_update, trades in interpreter.stream():
    # Process orderbook update
    print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")
    print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")
    
    # Process trades in this window
    for trade in trades:
        trade_id, price, size, side, timestamp_ms = trade[1:6]
        print(f"Trade: {side} {size} @ {price}")
```

## Dependencies

### Internal
- None (standalone module)

### External
- `sqlite3`: Database connectivity
- `pathlib`: Path handling
- `dataclasses`: Data structure definitions
- `typing`: Type annotations
- `logging`: Debug and error logging

## Performance Characteristics

- **Batch sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage
- **SQLite optimizations**: Read-only, immutable mode, large mmap and cache sizes
- **Memory efficient**: Streaming iterator pattern prevents loading entire dataset
- **Temporal windowing**: One-row lookahead for precise time boundary calculation

## Testing

Run module tests:
```bash
uv run pytest test_db_interpreter.py -v
```

Test coverage includes:
- Batch reading correctness
- Temporal window boundary handling
- Trade-to-window assignment accuracy
- End-of-stream behavior
- Error handling for malformed data

## Known Issues

- Requires specific database schema (book and trades tables)
- Python-literal string parsing assumes well-formed input
- Large databases may require memory monitoring during streaming

## Configuration

- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048)
- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096)
- SQLite PRAGMA settings optimized for read-only sequential access
WIP UI rework with qt6 2025-09-10 15:39:16 +08:00			`# Module: db_interpreter`

			`## Purpose`
			The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.

			`## Public Interface`

			`### Classes`
			- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook
			- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp

			`### Functions`
			- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings

			`### Methods`
			- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows

			`## Usage Examples`

			```python
			`from pathlib import Path`
			`from db_interpreter import DBInterpreter`

			`# Initialize interpreter`
			`db_path = Path("data/BTC-USDT-2025-01-01.db")`
			`interpreter = DBInterpreter(db_path)`

			`# Stream orderbook and trade data`
			`for ob_update, trades in interpreter.stream():`
			`# Process orderbook update`
			`print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")`
			`print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")`

			`# Process trades in this window`
			`for trade in trades:`
			`trade_id, price, size, side, timestamp_ms = trade[1:6]`
			`print(f"Trade: {side} {size} @ {price}")`
			```

			`## Dependencies`

			`### Internal`
			`- None (standalone module)`

			`### External`
			- `sqlite3`: Database connectivity
			- `pathlib`: Path handling
			- `dataclasses`: Data structure definitions
			- `typing`: Type annotations
			- `logging`: Debug and error logging

			`## Performance Characteristics`

			`- Batch sizes: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage`
			`- SQLite optimizations: Read-only, immutable mode, large mmap and cache sizes`
			`- Memory efficient: Streaming iterator pattern prevents loading entire dataset`
			`- Temporal windowing: One-row lookahead for precise time boundary calculation`

			`## Testing`

			`Run module tests:`
			```bash
			`uv run pytest test_db_interpreter.py -v`
			```

			`Test coverage includes:`
			`- Batch reading correctness`
			`- Temporal window boundary handling`
			`- Trade-to-window assignment accuracy`
			`- End-of-stream behavior`
			`- Error handling for malformed data`

			`## Known Issues`

			`- Requires specific database schema (book and trades tables)`
			`- Python-literal string parsing assumes well-formed input`
			`- Large databases may require memory monitoring during streaming`

			`## Configuration`

			- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048)
			- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096)
			`- SQLite PRAGMA settings optimized for read-only sequential access`