orderflow_backtest/docs/modules/db_interpreter.md

2.7 KiB

Module: db_interpreter

Purpose

The db_interpreter module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.

Public Interface

Classes

  • OrderbookLevel(price: float, size: float): Dataclass representing a single price level in the orderbook
  • OrderbookUpdate: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp

Functions

  • DBInterpreter(db_path: Path): Constructor that initializes read-only SQLite connection with optimized PRAGMA settings

Methods

  • stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]: Primary streaming interface that yields orderbook updates with associated trades in temporal windows

Usage Examples

from pathlib import Path
from db_interpreter import DBInterpreter

# Initialize interpreter
db_path = Path("data/BTC-USDT-2025-01-01.db")
interpreter = DBInterpreter(db_path)

# Stream orderbook and trade data
for ob_update, trades in interpreter.stream():
    # Process orderbook update
    print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")
    print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")
    
    # Process trades in this window
    for trade in trades:
        trade_id, price, size, side, timestamp_ms = trade[1:6]
        print(f"Trade: {side} {size} @ {price}")

Dependencies

Internal

  • None (standalone module)

External

  • sqlite3: Database connectivity
  • pathlib: Path handling
  • dataclasses: Data structure definitions
  • typing: Type annotations
  • logging: Debug and error logging

Performance Characteristics

  • Batch sizes: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage
  • SQLite optimizations: Read-only, immutable mode, large mmap and cache sizes
  • Memory efficient: Streaming iterator pattern prevents loading entire dataset
  • Temporal windowing: One-row lookahead for precise time boundary calculation

Testing

Run module tests:

uv run pytest test_db_interpreter.py -v

Test coverage includes:

  • Batch reading correctness
  • Temporal window boundary handling
  • Trade-to-window assignment accuracy
  • End-of-stream behavior
  • Error handling for malformed data

Known Issues

  • Requires specific database schema (book and trades tables)
  • Python-literal string parsing assumes well-formed input
  • Large databases may require memory monitoring during streaming

Configuration

  • BOOK_BATCH: Number of orderbook rows to fetch per query (default: 2048)
  • TRADE_BATCH: Number of trade rows to fetch per query (default: 4096)
  • SQLite PRAGMA settings optimized for read-only sequential access