WIP UI rework with qt6

2025-09-10 15:39:16 +08:00
parent 36385af6f3
commit ebf232317c
63 changed files with 4005 additions and 5221 deletions
--- a/docs/API.md
+++ b/docs/API.md
@@ -1,550 +1,23 @@
-# API Documentation
+# API Documentation (Current Implementation)

 ## Overview

-This document provides comprehensive API documentation for the Orderflow Backtest System, including public interfaces, data models, and usage examples.
+This document describes the public interfaces of the current system: SQLite streaming, OHLC/depth aggregation, JSON-based IPC, and the Dash visualizer. Metrics (OBI/CVD), repository/storage layers, and strategy APIs are not part of the current implementation.

-## Core Data Models
+## Input Database Schema (Required)

-### OrderbookLevel
-
-Represents a single price level in the orderbook.
-
-```python
-@dataclass(slots=True)
-class OrderbookLevel:
-    price: float                # Price level
-    size: float                 # Total size at this price
-    liquidation_count: int      # Number of liquidations
-    order_count: int           # Number of resting orders
-```
-
-**Example:**
-```python
-level = OrderbookLevel(
-    price=50000.0,
-    size=10.5,
-    liquidation_count=0,
-    order_count=3
-)
-```
-
-### Trade
-
-Represents a single trade execution.
-
-```python
-@dataclass(slots=True)
-class Trade:
-    id: int                    # Unique trade identifier
-    trade_id: float           # Exchange trade ID
-    price: float              # Execution price
-    size: float               # Trade size
-    side: str                 # "buy" or "sell"
-    timestamp: int            # Unix timestamp
-```
-
-**Example:**
-```python
-trade = Trade(
-    id=1,
-    trade_id=123456.0,
-    price=50000.0,
-    size=0.5,
-    side="buy",
-    timestamp=1640995200
-)
-```
-
-### BookSnapshot
-
-Complete orderbook state at a specific timestamp.
-
-```python
-@dataclass
-class BookSnapshot:
-    id: int                                              # Snapshot identifier
-    timestamp: int                                       # Unix timestamp
-    bids: Dict[float, OrderbookLevel]                   # Bid side levels
-    asks: Dict[float, OrderbookLevel]                   # Ask side levels
-    trades: List[Trade]                                 # Associated trades
-```
-
-**Example:**
-```python
-snapshot = BookSnapshot(
-    id=1,
-    timestamp=1640995200,
-    bids={
-        50000.0: OrderbookLevel(50000.0, 10.0, 0, 1),
-        49999.0: OrderbookLevel(49999.0, 5.0, 0, 1)
-    },
-    asks={
-        50001.0: OrderbookLevel(50001.0, 3.0, 0, 1),
-        50002.0: OrderbookLevel(50002.0, 2.0, 0, 1)
-    },
-    trades=[]
-)
-```
-
-### Metric
-
-Calculated financial metrics for a snapshot.
-
-```python
-@dataclass(slots=True)
-class Metric:
-    snapshot_id: int           # Reference to source snapshot
-    timestamp: int             # Unix timestamp
-    obi: float                # Order Book Imbalance [-1, 1]
-    cvd: float                # Cumulative Volume Delta
-    best_bid: float | None    # Best bid price
-    best_ask: float | None    # Best ask price
-```
-
-**Example:**
-```python
-metric = Metric(
-    snapshot_id=1,
-    timestamp=1640995200,
-    obi=0.333,
-    cvd=150.5,
-    best_bid=50000.0,
-    best_ask=50001.0
-)
-```
-
-## MetricCalculator API
-
-Static class providing financial metric calculations.
-
-### calculate_obi()
-
-```python
-@staticmethod
-def calculate_obi(snapshot: BookSnapshot) -> float:
-    """
-    Calculate Order Book Imbalance.
-    
-    Formula: OBI = (Vb - Va) / (Vb + Va)
-    
-    Args:
-        snapshot: BookSnapshot with bids and asks
-        
-    Returns:
-        float: OBI value between -1 and 1
-        
-    Example:
-        >>> obi = MetricCalculator.calculate_obi(snapshot)
-        >>> print(f"OBI: {obi:.3f}")
-        OBI: 0.333
-    """
-```
-
-### calculate_volume_delta()
-
-```python
-@staticmethod
-def calculate_volume_delta(trades: List[Trade]) -> float:
-    """
-    Calculate Volume Delta for trades.
-    
-    Formula: VD = Buy Volume - Sell Volume
-    
-    Args:
-        trades: List of Trade objects
-        
-    Returns:
-        float: Net volume delta
-        
-    Example:
-        >>> vd = MetricCalculator.calculate_volume_delta(trades)
-        >>> print(f"Volume Delta: {vd}")
-        Volume Delta: 7.5
-    """
-```
-
-### calculate_cvd()
-
-```python
-@staticmethod
-def calculate_cvd(previous_cvd: float, volume_delta: float) -> float:
-    """
-    Calculate Cumulative Volume Delta.
-    
-    Formula: CVD_t = CVD_{t-1} + VD_t
-    
-    Args:
-        previous_cvd: Previous CVD value
-        volume_delta: Current volume delta
-        
-    Returns:
-        float: New CVD value
-        
-    Example:
-        >>> cvd = MetricCalculator.calculate_cvd(100.0, 7.5)
-        >>> print(f"CVD: {cvd}")
-        CVD: 107.5
-    """
-```
-
-### get_best_bid_ask()
-
-```python
-@staticmethod
-def get_best_bid_ask(snapshot: BookSnapshot) -> tuple[float | None, float | None]:
-    """
-    Extract best bid and ask prices.
-    
-    Args:
-        snapshot: BookSnapshot with bids and asks
-        
-    Returns:
-        tuple: (best_bid, best_ask) or (None, None)
-        
-    Example:
-        >>> best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
-        >>> print(f"Spread: {best_ask - best_bid}")
-        Spread: 1.0
-    """
-```
-
-## Repository APIs
-
-### SQLiteOrderflowRepository
-
-Repository for orderbook, trades data and metrics.
-
-#### connect()
-
-```python
-def connect(self) -> sqlite3.Connection:
-    """
-    Create optimized SQLite connection.
-    
-    Returns:
-        sqlite3.Connection: Configured database connection
-        
-    Example:
-        >>> repo = SQLiteOrderflowRepository(db_path)
-        >>> with repo.connect() as conn:
-        ...     # Use connection
-    """
-```
-
-#### load_trades_by_timestamp()
-
-```python
-def load_trades_by_timestamp(self, conn: sqlite3.Connection) -> Dict[int, List[Trade]]:
-    """
-    Load all trades grouped by timestamp.
-    
-    Args:
-        conn: Active database connection
-        
-    Returns:
-        Dict[int, List[Trade]]: Trades grouped by timestamp
-        
-    Example:
-        >>> trades_by_ts = repo.load_trades_by_timestamp(conn)
-        >>> trades_at_1000 = trades_by_ts.get(1000, [])
-    """
-```
-
-#### iterate_book_rows()
-
-```python
-def iterate_book_rows(self, conn: sqlite3.Connection) -> Iterator[Tuple[int, str, str, int]]:
-    """
-    Memory-efficient iteration over orderbook rows.
-    
-    Args:
-        conn: Active database connection
-        
-    Yields:
-        Tuple[int, str, str, int]: (id, bids_text, asks_text, timestamp)
-        
-    Example:
-        >>> for row_id, bids, asks, ts in repo.iterate_book_rows(conn):
-        ...     # Process row
-    """
-```
-
-#### create_metrics_table()
-
-```python
-def create_metrics_table(self, conn: sqlite3.Connection) -> None:
-    """
-    Create metrics table with indexes.
-    
-    Args:
-        conn: Active database connection
-        
-    Raises:
-        sqlite3.Error: If table creation fails
-        
-    Example:
-        >>> repo.create_metrics_table(conn)
-        >>> # Metrics table now available
-    """
-```
-
-#### insert_metrics_batch()
-
-```python
-def insert_metrics_batch(self, conn: sqlite3.Connection, metrics: List[Metric]) -> None:
-    """
-    Insert metrics in batch for performance.
-    
-    Args:
-        conn: Active database connection
-        metrics: List of Metric objects to insert
-        
-    Example:
-        >>> metrics = [Metric(...), Metric(...)]
-        >>> repo.insert_metrics_batch(conn, metrics)
-        >>> conn.commit()
-    """
-```
-
-#### load_metrics_by_timerange()
-
-```python
-def load_metrics_by_timerange(
-    self, 
-    conn: sqlite3.Connection, 
-    start_timestamp: int, 
-    end_timestamp: int
-) -> List[Metric]:
-    """
-    Load metrics within time range.
-    
-    Args:
-        conn: Active database connection
-        start_timestamp: Start time (inclusive)
-        end_timestamp: End time (inclusive)
-        
-    Returns:
-        List[Metric]: Metrics ordered by timestamp
-        
-    Example:
-        >>> metrics = repo.load_metrics_by_timerange(conn, 1000, 2000)
-        >>> print(f"Loaded {len(metrics)} metrics")
-    """
-```
-
-## Storage API
-
-### Storage
-
-High-level data processing orchestrator.
-
-#### __init__()
-
-```python
-def __init__(self, instrument: str) -> None:
-    """
-    Initialize storage for specific instrument.
-    
-    Args:
-        instrument: Trading pair identifier (e.g., "BTC-USDT")
-        
-    Example:
-        >>> storage = Storage("BTC-USDT")
-    """
-```
-
-#### build_booktick_from_db()
-
-```python
-def build_booktick_from_db(self, db_path: Path, db_date: datetime) -> None:
-    """
-    Process database and calculate metrics.
-    
-    This is the main processing pipeline that:
-    1. Loads orderbook and trades data
-    2. Calculates OBI and CVD metrics per snapshot
-    3. Stores metrics in database
-    4. Populates book with snapshots
-    
-    Args:
-        db_path: Path to SQLite database file
-        db_date: Date for this database (informational)
-        
-    Example:
-        >>> storage.build_booktick_from_db(Path("data.db"), datetime.now())
-        >>> print(f"Processed {len(storage.book.snapshots)} snapshots")
-    """
-```
-
-## Strategy API
-
-### DefaultStrategy
-
-Trading strategy with metrics analysis capabilities.
-
-#### __init__()
-
-```python
-def __init__(self, instrument: str) -> None:
-    """
-    Initialize strategy for instrument.
-    
-    Args:
-        instrument: Trading pair identifier
-        
-    Example:
-        >>> strategy = DefaultStrategy("BTC-USDT")
-    """
-```
-
-#### set_db_path()
-
-```python
-def set_db_path(self, db_path: Path) -> None:
-    """
-    Configure database path for metrics access.
-    
-    Args:
-        db_path: Path to database with metrics
-        
-    Example:
-        >>> strategy.set_db_path(Path("data.db"))
-    """
-```
-
-#### load_stored_metrics()
-
-```python
-def load_stored_metrics(self, start_timestamp: int, end_timestamp: int) -> List[Metric]:
-    """
-    Load stored metrics for analysis.
-    
-    Args:
-        start_timestamp: Start of time range
-        end_timestamp: End of time range
-        
-    Returns:
-        List[Metric]: Metrics for specified range
-        
-    Example:
-        >>> metrics = strategy.load_stored_metrics(1000, 2000)
-        >>> latest_obi = metrics[-1].obi
-    """
-```
-
-#### get_metrics_summary()
-
-```python
-def get_metrics_summary(self, metrics: List[Metric]) -> dict:
-    """
-    Generate statistical summary of metrics.
-    
-    Args:
-        metrics: List of metrics to analyze
-        
-    Returns:
-        dict: Statistical summary with keys:
-            - obi_min, obi_max, obi_avg
-            - cvd_start, cvd_end, cvd_change
-            - total_snapshots
-            
-    Example:
-        >>> summary = strategy.get_metrics_summary(metrics)
-        >>> print(f"OBI range: {summary['obi_min']:.3f} to {summary['obi_max']:.3f}")
-    """
-```
-
-## Visualizer API
-
-### Visualizer
-
-Multi-chart visualization system.
-
-#### __init__()
-
-```python
-def __init__(self, window_seconds: int = 60, max_bars: int = 200) -> None:
-    """
-    Initialize visualizer with chart parameters.
-    
-    Args:
-        window_seconds: OHLC aggregation window
-        max_bars: Maximum bars to display
-        
-    Example:
-        >>> visualizer = Visualizer(window_seconds=300, max_bars=1000)
-    """
-```
-
-#### set_db_path()
-
-```python
-def set_db_path(self, db_path: Path) -> None:
-    """
-    Configure database path for metrics loading.
-    
-    Args:
-        db_path: Path to database with metrics
-        
-    Example:
-        >>> visualizer.set_db_path(Path("data.db"))
-    """
-```
-
-#### update_from_book()
-
-```python
-def update_from_book(self, book: Book) -> None:
-    """
-    Update charts with book data and stored metrics.
-    
-    Creates 4-subplot layout:
-    1. OHLC candlesticks
-    2. Volume bars  
-    3. OBI line chart
-    4. CVD line chart
-    
-    Args:
-        book: Book with snapshots for OHLC calculation
-        
-    Example:
-        >>> visualizer.update_from_book(storage.book)
-        >>> # Charts updated with latest data
-    """
-```
-
-#### show()
-
-```python
-def show() -> None:
-    """
-    Display interactive chart window.
-    
-    Example:
-        >>> visualizer.show()
-        >>> # Interactive Qt5 window opens
-    """
-```
-
-## Database Schema
-
-### Input Tables (Required)
-
-These tables must exist in the SQLite database files:
-
-#### book table
+### book table
 ```sql
 CREATE TABLE book (
    id INTEGER PRIMARY KEY,
    instrument TEXT,
-    bids TEXT NOT NULL,        -- JSON array: [[price, size, liq_count, order_count], ...]
-    asks TEXT NOT NULL,        -- JSON array: [[price, size, liq_count, order_count], ...]
+    bids TEXT NOT NULL,        -- Python-literal: [[price, size, ...], ...]
+    asks TEXT NOT NULL,        -- Python-literal: [[price, size, ...], ...]
    timestamp TEXT NOT NULL
 );
 ```

-#### trades table
+### trades table
 ```sql
 CREATE TABLE trades (
    id INTEGER PRIMARY KEY,
@@ -557,129 +30,122 @@ CREATE TABLE trades (
 );
 ```

-### Output Table (Auto-created)
+## Data Access: db_interpreter.py

-This table is automatically created by the system:
+### Classes
+- `OrderbookLevel` (dataclass): represents a price level.
+- `OrderbookUpdate`: windowed book update with `bids`, `asks`, `timestamp`, `end_timestamp`.

-#### metrics table
-```sql
-CREATE TABLE metrics (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    snapshot_id INTEGER NOT NULL,
-    timestamp TEXT NOT NULL,
-    obi REAL NOT NULL,         -- Order Book Imbalance [-1, 1]
-    cvd REAL NOT NULL,         -- Cumulative Volume Delta
-    best_bid REAL,             -- Best bid price
-    best_ask REAL,             -- Best ask price
-    FOREIGN KEY (snapshot_id) REFERENCES book(id)
-);
+### DBInterpreter
+```python
+class DBInterpreter:
+    def __init__(self, db_path: Path): ...

-- Performance indexes
-CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
-CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
+    def stream(self) -> Iterator[tuple[OrderbookUpdate, list[tuple]]]:
+        """
+        Stream orderbook rows with one-row lookahead and trades in timestamp order.
+        Yields pairs of (OrderbookUpdate, trades_in_window), where each trade tuple is:
+        (id, trade_id, price, size, side, timestamp_ms) and timestamp_ms ∈ [timestamp, end_timestamp).
+        """
 ```

+- Read-only SQLite connection with PRAGMA tuning (immutable, query_only, mmap, cache).
+- Batch sizes: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`.
+
+## Processing: ohlc_processor.py
+
+### OHLCProcessor
+```python
+class OHLCProcessor:
+    def __init__(self, window_seconds: int = 60, depth_levels_per_side: int = 50): ...
+
+    def process_trades(self, trades: list[tuple]) -> None:
+        """Aggregate trades into OHLC bars per window; throttled upserts for UI responsiveness."""
+
+    def update_orderbook(self, ob_update: OrderbookUpdate) -> None:
+        """Maintain in-memory price→size maps, apply partial updates, and emit top-N depth snapshots periodically."""
+
+    def finalize(self) -> None:
+        """Emit the last OHLC bar if present."""
+```
+
+- Internal helpers for parsing levels from JSON or Python-literal strings and for applying deletions (size==0).
+
+## Inter-Process Communication: viz_io.py
+
+### Files
+- `ohlc_data.json`: rolling array of OHLC bars (max 1000).
+- `depth_data.json`: latest depth snapshot (bids/asks), top-N per side.
+- `metrics_data.json`: rolling array of OBI OHLC bars (max 1000).
+
+### Functions
+```python
+def add_ohlc_bar(timestamp: int, open_price: float, high_price: float, low_price: float, close_price: float, volume: float = 0.0) -> None: ...
+
+def upsert_ohlc_bar(timestamp: int, open_price: float, high_price: float, low_price: float, close_price: float, volume: float = 0.0) -> None: ...
+
+def clear_data() -> None: ...
+ 
+def add_metric_bar(timestamp: int, obi_open: float, obi_high: float, obi_low: float, obi_close: float) -> None: ...
+                   
+def upsert_metric_bar(timestamp: int, obi_open: float, obi_high: float, obi_low: float, obi_close: float) -> None: ...
+                      
+def clear_metrics() -> None: ...
+```
+
+- Atomic writes via temp file replace to prevent partial reads.
+
+## Visualization: app.py (Dash)
+
+- Three visuals: OHLC+Volume and Depth (cumulative) with Plotly dark theme, plus an OBI candlestick subplot beneath Volume.
+- Polling interval: 500 ms. Tolerates JSON decode races using cached last values.
+
+### Callback Contract
+```python
+@app.callback(
+    [Output('ohlc-chart', 'figure'), Output('depth-chart', 'figure')],
+    [Input('interval-update', 'n_intervals')]
+)
+```
+- Reads `ohlc_data.json` (list of `[ts, open, high, low, close, volume]`).
+- Reads `depth_data.json` (`{"bids": [[price, size], ...], "asks": [[price, size], ...]}`).
+- Reads `metrics_data.json` (list of `[ts, obi_o, obi_h, obi_l, obi_c]`).
+
+## CLI Orchestration: main.py
+
+### Typer Entry Point
+```python
+def main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None:
+    """Stream DBs, process OHLC/depth, and launch Dash visualizer in a separate process."""
+```
+
+- Discovers databases under `../data/OKX` matching the instrument and date range.
+- Launches UI: `uv run python app.py`.
+
 ## Usage Examples

-### Complete Processing Workflow
-
-```python
-from pathlib import Path
-from datetime import datetime
-from storage import Storage
-from strategies import DefaultStrategy
-from visualizer import Visualizer
-
-# Initialize components
-storage = Storage("BTC-USDT")
-strategy = DefaultStrategy("BTC-USDT")
-visualizer = Visualizer(window_seconds=60, max_bars=500)
-
-# Process database
-db_path = Path("data/BTC-USDT-25-06-09.db")
-strategy.set_db_path(db_path)
-visualizer.set_db_path(db_path)
-
-# Build book and calculate metrics
-storage.build_booktick_from_db(db_path, datetime.now())
-
-# Analyze metrics
-strategy.on_booktick(storage.book)
-
-# Update visualization
-visualizer.update_from_book(storage.book)
-visualizer.show()
+### Run processing + UI
+```bash
+uv run python main.py BTC-USDT 2025-07-01 2025-08-01 --window-seconds 60
+# Open http://localhost:8050
 ```

-### Metrics Analysis
-
+### Process trades and update depth in a loop (conceptual)
 ```python
-# Load and analyze stored metrics
-strategy = DefaultStrategy("BTC-USDT")
-strategy.set_db_path(Path("data.db"))
+from db_interpreter import DBInterpreter
+from ohlc_processor import OHLCProcessor

-# Get metrics for specific time range
-metrics = strategy.load_stored_metrics(1640995200, 1640998800)
-
-# Analyze metrics
-summary = strategy.get_metrics_summary(metrics)
-print(f"OBI Range: {summary['obi_min']:.3f} to {summary['obi_max']:.3f}")
-print(f"CVD Change: {summary['cvd_change']:.1f}")
-
-# Find significant imbalances
-significant_obi = [m for m in metrics if abs(m.obi) > 0.2]
-print(f"Found {len(significant_obi)} snapshots with >20% imbalance")
-```
-
-### Custom Metric Calculations
-
-```python
-from models import MetricCalculator
-
-# Calculate metrics for single snapshot
-obi = MetricCalculator.calculate_obi(snapshot)
-best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
-
-# Calculate CVD over time
-cvd = 0.0
-for trades in trades_by_timestamp.values():
-    volume_delta = MetricCalculator.calculate_volume_delta(trades)
-    cvd = MetricCalculator.calculate_cvd(cvd, volume_delta)
-    print(f"CVD: {cvd:.1f}")
+processor = OHLCProcessor(window_seconds=60)
+for ob_update, trades in DBInterpreter(db_path).stream():
+    processor.process_trades(trades)
+    processor.update_orderbook(ob_update)
+processor.finalize()
 ```

 ## Error Handling
+- Reader/Writer coordination via atomic JSON prevents partial reads.
+- Visualizer caches last valid data if JSON decoding fails mid-write; logs warnings.
+- Visualizer start failures do not stop processing; logs error and continues.

-### Common Error Scenarios
-
-#### Database Connection Issues
-```python
-try:
-    repo = SQLiteOrderflowRepository(db_path)
-    with repo.connect() as conn:
-        metrics = repo.load_metrics_by_timerange(conn, start, end)
-except sqlite3.Error as e:
-    logging.error(f"Database error: {e}")
-    metrics = []  # Fallback to empty list
-```
-
-#### Missing Metrics Table
-```python
-repo = SQLiteOrderflowRepository(db_path)
-with repo.connect() as conn:
-    if not repo.table_exists(conn, "metrics"):
-        repo.create_metrics_table(conn)
-        logging.info("Created metrics table")
-```
-
-#### Empty Data Handling
-```python
-# All methods handle empty data gracefully
-obi = MetricCalculator.calculate_obi(empty_snapshot)  # Returns 0.0
-vd = MetricCalculator.calculate_volume_delta([])      # Returns 0.0
-summary = strategy.get_metrics_summary([])           # Returns {}
-```
-
---
-
-This API documentation provides complete coverage of the public interfaces for the Orderflow Backtest System. For implementation details and architecture information, see the additional documentation in the `docs/` directory.
+## Notes
+- Metrics computation includes simplified OBI (Order Book Imbalance) calculated as bid_total - ask_total. Repository/storage layers and strategy APIs are intentionally kept minimal.
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@@ -5,42 +5,52 @@ All notable changes to the Orderflow Backtest System are documented in this file
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

-## [2.0.0] - 2024-Current
+## [Unreleased]

 ### Added
- **OBI Metrics Calculation**: Order Book Imbalance calculation with formula `(Vb - Va) / (Vb + Va)`
- **CVD Metrics Calculation**: Cumulative Volume Delta with incremental calculation and reset functionality
- **Persistent Metrics Storage**: SQLite-based storage for calculated metrics to avoid recalculation
- **Memory Optimization**: >70% reduction in peak memory usage through streaming processing
- **Enhanced Visualization**: Multi-subplot charts with OHLC, Volume, OBI, and CVD displays
- **MetricCalculator Class**: Static methods for financial metrics computation
- **Batch Processing**: High-performance batch inserts (1000 records per operation)
- **Time-Range Queries**: Efficient metrics retrieval for specified time periods
- **Strategy Enhancement**: Metrics analysis capabilities in `DefaultStrategy`
- **Comprehensive Testing**: 27 tests across 6 test files with full integration coverage
+- Comprehensive documentation structure with module-specific guides
+- Architecture Decision Records (ADRs) for major technical decisions
+- CONTRIBUTING.md with development guidelines and standards
+- Enhanced module documentation in `docs/modules/` directory
+- Dependency documentation with security and performance considerations

 ### Changed
- **Storage Architecture**: Modified `Storage.build_booktick_from_db()` to integrate metrics calculation
- **Visualization Separation**: Moved visualization from strategy to main application for better separation of concerns
- **Strategy Interface**: Simplified `DefaultStrategy` constructor (removed `enable_visualization` parameter)
- **Main Application Flow**: Enhanced orchestration with per-database visualization updates
- **Database Schema**: Auto-creation of metrics table with proper indexes and foreign key constraints
- **Memory Management**: Stream processing instead of keeping full snapshot history
+- Documentation structure reorganized to follow documentation standards
+- Improved code documentation requirements with examples
+- Enhanced testing guidelines with coverage requirements

-### Improved
- **Performance**: Batch database operations and optimized SQLite PRAGMAs
- **Scalability**: Support for months to years of high-frequency trading data
- **Code Quality**: All functions <50 lines, all files <250 lines
- **Documentation**: Comprehensive module and API documentation
- **Error Handling**: Graceful degradation and comprehensive logging
- **Type Safety**: Full type annotations throughout codebase
+## [2.0.0] - 2024-12-Present
+
+### Added
+- **Simplified Pipeline Architecture**: Streamlined SQLite → OHLC/Depth → JSON → Dash pipeline
+- **JSON-based IPC**: Atomic file-based communication between processor and visualizer
+- **Real-time Visualization**: Dash web application with 500ms polling updates
+- **OHLC Aggregation**: Configurable time window aggregation with throttled updates
+- **Orderbook Depth**: Real-time depth snapshots with top-N level management
+- **OBI Metrics**: Order Book Imbalance calculation with candlestick visualization
+- **Atomic JSON Operations**: Race-condition-free data exchange via temp files
+- **CLI Orchestration**: Typer-based command interface with process management
+- **Performance Optimizations**: Batch reading with optimized SQLite PRAGMA settings
+
+### Changed
+- **Architecture Simplification**: Removed complex repository/storage layers
+- **Data Flow**: Direct streaming from database to visualization via JSON
+- **Error Handling**: Graceful degradation with cached data fallbacks
+- **Process Management**: Separate visualization process launched automatically
+- **Memory Efficiency**: Bounded datasets prevent unlimited memory growth

 ### Technical Details
- **New Tables**: `metrics` table with indexes on timestamp and snapshot_id
- **New Models**: `Metric` dataclass for calculated values
- **Processing Pipeline**: Snapshot → Calculate → Store → Discard workflow
- **Query Interface**: Time-range based metrics retrieval
- **Visualization Layout**: 4-subplot layout with shared time axis
+- **Database Access**: Read-only SQLite with immutable mode and mmap optimization
+- **Batch Sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal performance
+- **JSON Formats**: Standardized schemas for OHLC, depth, and metrics data
+- **Chart Architecture**: Multi-subplot layout with shared time axis
+- **IPC Files**: `ohlc_data.json`, `depth_data.json`, `metrics_data.json`
+
+### Removed
+- Complex metrics storage and repository patterns
+- Strategy framework components
+- In-memory snapshot retention
+- Multi-database orchestration complexity

 ## [1.0.0] - Previous Version

--- a/docs/CONTEXT.md
+++ b/docs/CONTEXT.md
@@ -2,162 +2,52 @@

 ## Current State

-The Orderflow Backtest System has successfully implemented a comprehensive OBI (Order Book Imbalance) and CVD (Cumulative Volume Delta) metrics calculation and visualization system. The project is in a production-ready state with full feature completion.
+The project implements a modular, efficient orderflow processing pipeline:
+- Stream orderflow from SQLite (`DBInterpreter.stream`).
+- Process trades and orderbook updates through modular `OHLCProcessor` architecture.
+- Exchange data with the UI via atomic JSON files (`viz_io`).
+- Render OHLC+Volume, Depth, and Metrics charts with a Dash app (`app.py`).

-## Recent Achievements
+The system features a clean composition-based architecture with specialized modules for different concerns, providing OBI/CVD metrics alongside OHLC data.

-### ✅ Completed Features (Latest Implementation)
- **Metrics Calculation Engine**: Complete OBI and CVD calculation with per-snapshot granularity
- **Persistent Storage**: Metrics stored in SQLite database to avoid recalculation
- **Memory Optimization**: >70% memory usage reduction through efficient data management
- **Visualization System**: Multi-subplot charts (OHLC, Volume, OBI, CVD) with shared time axis
- **Strategy Framework**: Enhanced trading strategy system with metrics analysis
- **Clean Architecture**: Proper separation of concerns between data, analysis, and visualization
+## Recent Work

-### 📊 System Metrics
- **Performance**: Batch processing of 1000 records per operation
- **Memory**: >70% reduction in peak memory usage
- **Test Coverage**: 27 comprehensive tests across 6 test files
- **Code Quality**: All functions <50 lines, all files <250 lines
+- **Modular Refactoring**: Extracted `ohlc_processor.py` into focused modules:
+  - `level_parser.py`: Orderbook level parsing utilities (85 lines)
+  - `orderbook_manager.py`: In-memory orderbook state management (90 lines) 
+  - `metrics_calculator.py`: OBI and CVD metrics calculation (112 lines)
+- **Architecture Compliance**: Reduced main processor from 440 to 248 lines (250-line target achieved)
+- Maintained full backward compatibility and functionality
+- Implemented read-only, batched SQLite streaming with PRAGMA tuning.
+- Added robust JSON IPC with atomic writes and tolerant UI reads.
+- Built a responsive Dash visualization polling at 500ms.
+- Unified CLI using Typer, with UV for process management.

-## Architecture Decisions
+## Conventions

-### Key Design Patterns
-1. **Repository Pattern**: Clean separation between data access and business logic
-2. **Dataclass Models**: Lightweight, type-safe data structures with slots optimization
-3. **Batch Processing**: High-performance database operations for large datasets
-4. **Separation of Concerns**: Strategy, Storage, and Visualization as independent components
+- Python 3.12+, UV for dependency and command execution.
+- **Modular Architecture**: Composition over inheritance, single-responsibility modules
+- **File Size Limits**: ≤250 lines per file, ≤50 lines per function (enforced)
+- Type hints throughout; concise, focused functions and classes.
+- Error handling with meaningful logs; avoid bare exceptions.
+- Prefer explicit JSON structures for IPC; keep payloads small and bounded.

-### Technology Stack
- **Language**: Python 3.12+ with type hints
- **Database**: SQLite with optimized PRAGMAs for performance
- **Package Management**: UV for fast dependency resolution
- **Testing**: Pytest with comprehensive unit and integration tests
- **Visualization**: Matplotlib with Qt5Agg backend
+## Priorities

-## Current Development Priorities
+- Improve configurability: database path discovery, CLI flags for paths and UI options.
+- Add tests for `DBInterpreter.stream` and `OHLCProcessor` (run with `uv run pytest`).
+- Performance tuning for large DBs while keeping UI responsive.
+- Documentation kept in sync with code; architecture reflects current design.

-### ✅ Completed (Production Ready)
-1. **Core Metrics System**: OBI and CVD calculation infrastructure
-2. **Database Integration**: Persistent storage and retrieval system
-3. **Visualization Framework**: Multi-chart display with proper time alignment
-4. **Memory Optimization**: Efficient processing of large datasets
-5. **Code Quality**: Comprehensive testing and documentation
+## Roadmap (Future Work)

-### 🔄 Maintenance Phase
- **Documentation**: Comprehensive docs completed
- **Testing**: Full test coverage maintained
- **Performance**: Monitoring and optimization as needed
- **Bug Fixes**: Address any issues discovered in production use
+- Enhance OBI metrics with additional derived calculations (e.g., normalized OBI).
+- Optional repository layer abstraction and a storage orchestrator.
+- Extend visualization with additional subplots and interactivity.
+- Strategy module for analytics and alerting on derived metrics.

-## Known Patterns and Conventions
+## Tooling

-### Code Style
- **Functions**: Maximum 50 lines, single responsibility
- **Files**: Maximum 250 lines, clear module boundaries
- **Naming**: Descriptive names, no abbreviations except domain terms (OBI, CVD)
- **Error Handling**: Comprehensive try-catch with logging, graceful degradation
-
-### Database Patterns
- **Parameterized Queries**: All SQL uses proper parameterization for security
- **Batch Operations**: Process records in batches of 1000 for performance
- **Indexing**: Strategic indexes on timestamp and foreign key columns
- **Transactions**: Proper transaction boundaries for data consistency
-
-### Testing Patterns
- **Unit Tests**: Each module has comprehensive unit test coverage
- **Integration Tests**: End-to-end workflow testing
- **Mock Objects**: External dependencies mocked for isolated testing
- **Test Data**: Temporary databases with realistic test data
-
-## Integration Points
-
-### External Dependencies
- **SQLite**: Primary data storage (read and write operations)
- **Matplotlib**: Chart rendering and visualization
- **Qt5Agg**: GUI backend for interactive charts
- **Pytest**: Testing framework
-
-### Internal Module Dependencies
-```
-main.py → storage.py → repositories/ → models.py
-       → strategies.py → models.py
-       → visualizer.py → repositories/
-```
-
-## Performance Characteristics
-
-### Optimizations Implemented
- **Memory Management**: Metrics storage instead of full snapshot retention
- **Database Performance**: Optimized SQLite PRAGMAs and batch processing
- **Query Efficiency**: Indexed queries with proper WHERE clauses
- **Cache Usage**: Price caching in orderbook parser for repeated calculations
-
-### Scalability Notes
- **Dataset Size**: Tested with 600K+ snapshots and 300K+ trades per day
- **Time Range**: Supports months to years of historical data
- **Processing Speed**: ~1000 rows/second with full metrics calculation
- **Storage Overhead**: Metrics table adds <20% to original database size
-
-## Security Considerations
-
-### Implemented Safeguards
- **SQL Injection Prevention**: All queries use parameterized statements
- **Input Validation**: Database paths and table names validated
- **Error Information**: No sensitive data exposed in error messages
- **Access Control**: Database file permissions respected
-
-## Future Considerations
-
-### Potential Enhancements
- **Real-time Processing**: Streaming data support for live trading
- **Additional Metrics**: Volume Profile, Delta Flow, Liquidity metrics
- **Export Capabilities**: CSV/JSON export for external analysis
- **Interactive Charts**: Enhanced user interaction with visualization
- **Configuration System**: Configurable batch sizes and processing parameters
-
-### Scalability Options
- **Database Upgrade**: PostgreSQL for larger datasets if needed
- **Parallel Processing**: Multi-threading for CPU-intensive calculations
- **Caching Layer**: Redis for frequently accessed metrics
- **API Interface**: REST API for external system integration
-
-## Development Environment
-
-### Requirements
- Python 3.12+
- UV package manager
- SQLite database files with required schema
- Qt5 for visualization (Linux/macOS)
-
-### Setup Commands
-```bash
-# Install dependencies
-uv sync
-
-# Run full test suite
-uv run pytest
-
-# Process sample data
-uv run python main.py BTC-USDT 2025-07-01 2025-08-01
-```
-
-## Documentation Status
-
-### ✅ Complete Documentation
- README.md with comprehensive overview
- Module-level documentation for all components
- API documentation with examples
- Architecture decision records
- Code-level documentation with docstrings
-
-### 📊 Quality Metrics
- **Code Coverage**: 27 tests across 6 test files
- **Documentation Coverage**: All public interfaces documented
- **Example Coverage**: Working examples for all major features
- **Error Documentation**: All error conditions documented
-
---
-
-*Last Updated: Current as of OBI/CVD metrics system completion*
-*Next Review: As needed for maintenance or feature additions*
+- Package management and commands: UV (e.g., `uv sync`, `uv run ...`).
+- Visualization server: Dash on `http://localhost:8050`.
+- Linting/testing: Pytest (e.g., `uv run pytest`).
--- a/docs/README.md
+++ b/docs/README.md
@@ -2,50 +2,25 @@

 ## Overview

-This directory contains comprehensive documentation for the Orderflow Backtest System, a high-performance cryptocurrency trading data analysis platform.
+This directory contains documentation for the current Orderflow Backtest System, which streams historical orderflow from SQLite, aggregates OHLC bars, maintains a lightweight depth snapshot, and renders charts via a Dash web application.

 ## Documentation Structure

-### 📚 Main Documentation
- **[CONTEXT.md](./CONTEXT.md)**: Current project state, architecture decisions, and development patterns
- **[architecture.md](./architecture.md)**: System architecture, component relationships, and data flow
- **[API.md](./API.md)**: Public interfaces, classes, and function documentation
-
-### 📦 Module Documentation
- **[modules/metrics.md](./modules/metrics.md)**: OBI and CVD calculation system
- **[modules/storage.md](./modules/storage.md)**: Data processing and persistence layer
- **[modules/visualization.md](./modules/visualization.md)**: Chart rendering and display system
- **[modules/repositories.md](./modules/repositories.md)**: Database access and operations
-
-### 🏗️ Architecture Decisions
- **[decisions/ADR-001-metrics-storage.md](./decisions/ADR-001-metrics-storage.md)**: Persistent metrics storage decision
- **[decisions/ADR-002-visualization-separation.md](./decisions/ADR-002-visualization-separation.md)**: Separation of concerns for visualization
-
-### 📋 Development Guides
- **[CONTRIBUTING.md](./CONTRIBUTING.md)**: Development workflow and contribution guidelines
- **[CHANGELOG.md](./CHANGELOG.md)**: Version history and changes
+- `architecture.md`: System architecture, component relationships, and data flow (SQLite → Streaming → OHLC/Depth → JSON → Dash)
+- `API.md`: Public interfaces for DB streaming, OHLC/depth processing, JSON IPC, Dash visualization, and CLI
+- `CONTEXT.md`: Project state, conventions, and development priorities
+- `decisions/`: Architecture decision records

 ## Quick Navigation

 | Topic | Documentation |
 |-------|---------------|
-| **Getting Started** | [README.md](../README.md) |
-| **System Architecture** | [architecture.md](./architecture.md) |
-| **Metrics Calculation** | [modules/metrics.md](./modules/metrics.md) |
-| **Database Schema** | [API.md](./API.md#database-schema) |
-| **Development Setup** | [CONTRIBUTING.md](./CONTRIBUTING.md) |
-| **API Reference** | [API.md](./API.md) |
+| Getting Started | See the usage examples in `API.md` |
+| System Architecture | `architecture.md` |
+| Database Schema | `API.md#input-database-schema-required` |
+| Development Setup | Project root `README` and `pyproject.toml` |

-## Documentation Standards
+## Notes

-This documentation follows the project's documentation standards defined in `.cursor/rules/documentation.mdc`. All documentation includes:
-
- Clear purpose and scope
- Code examples with working implementations
- API documentation with request/response formats
- Error handling and edge cases
- Dependencies and requirements
-
-## Maintenance
-
-Documentation is updated with every significant code change and reviewed during the development process. See [CONTRIBUTING.md](./CONTRIBUTING.md) for details on documentation maintenance procedures.
+- Metrics (OBI/CVD), repository/storage layers, and strategy components have been removed from the current codebase and are planned as future enhancements.
+- Use UV for package management and running commands. Example: `uv run python main.py ...`.
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -2,303 +2,155 @@

 ## Overview

-The Orderflow Backtest System is designed as a modular, high-performance data processing pipeline for cryptocurrency trading analysis. The architecture emphasizes separation of concerns, efficient memory usage, and scalable processing of large datasets.
+The current system is a streamlined, high-performance pipeline that streams orderflow from SQLite databases, aggregates trades into OHLC bars, maintains a lightweight depth snapshot, and serves visuals via a Dash web application. Inter-process communication (IPC) between the processor and visualizer uses atomic JSON files for simplicity and robustness.

 ## High-Level Architecture

 ```
-┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
-│   Data Sources  │    │   Processing     │    │   Presentation  │
-│                 │    │                  │    │                 │
-│ ┌─────────────┐ │    │ ┌──────────────┐ │    │ ┌─────────────┐ │
-│ │SQLite Files │─┼────┼→│   Storage    │─┼────┼→│ Visualizer  │ │
-│ │- orderbook  │ │    │ │- Orchestrator│ │    │ │- OHLC Charts│ │
-│ │- trades     │ │    │ │- Calculator  │ │    │ │- OBI/CVD    │ │
-│ └─────────────┘ │    │ └──────────────┘ │    │ └─────────────┘ │
-│                 │    │        │         │    │        ▲        │
-└─────────────────┘    │ ┌─────────────┐  │    │ ┌─────────────┐ │
-                       │ │  Strategy   │──┼────┼→│   Reports   │ │
-                       │ │- Analysis   │  │    │ │- Metrics    │ │
-                       │ │- Alerts     │  │    │ │- Summaries  │ │
-                       │ └─────────────┘  │    │ └─────────────┘ │
-                       └──────────────────┘    └─────────────────┘
+┌─────────────────┐   ┌─────────────────────┐   ┌──────────────────┐   ┌──────────────────┐
+│  SQLite Files   │ → │   DB Interpreter    │ → │   OHLC/Depth     │ → │  Dash Visualizer │
+│  (book,trades)  │   │  (stream rows)      │   │   Processor      │   │  (app.py)        │
+└─────────────────┘   └─────────────────────┘   └─────────┬────────┘   └────────────▲─────┘
+                                                          │                         │
+                                                          │  Atomic JSON (IPC)      │
+                                                          ▼                         │
+                                                  ohlc_data.json, depth_data.json   │
+                                                  metrics_data.json                 │
+                                                                                    │
+                                                                              Browser UI
 ```

-## Component Architecture
+## Components

-### Data Layer
+### Data Access (`db_interpreter.py`)

-#### Models (`models.py`)
-**Purpose**: Core data structures and calculation logic
+- `OrderbookLevel`: dataclass representing one price level.
+- `OrderbookUpdate`: container for a book row window with `bids`, `asks`, `timestamp`, and `end_timestamp`.
+- `DBInterpreter`:
+  - `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]` streams the book table with lookahead and the trades table in timestamp order.
+  - Efficient read-only connection with PRAGMA tuning: immutable mode, query_only, temp_store=MEMORY, mmap_size, cache_size.
+  - Batching constants: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`.
+  - Each yielded `trades` element is a tuple `(id, trade_id, price, size, side, timestamp_ms)` that falls within `[book.timestamp, next_book.timestamp)`.

-```python
-# Core data models
-OrderbookLevel   # Single price level (price, size, order_count, liquidation_count)
-Trade           # Individual trade execution (price, size, side, timestamp)
-BookSnapshot    # Complete orderbook state at timestamp
-Book           # Container for snapshot sequence
-Metric         # Calculated OBI/CVD values
+### Processing (Modular Architecture)

-# Calculation engine
-MetricCalculator # Static methods for OBI/CVD computation
-```
+#### Main Coordinator (`ohlc_processor.py`)
+- `OHLCProcessor(window_seconds=60, depth_levels_per_side=50)`: Orchestrates trade processing using composition
+  - `process_trades(trades)`: aggregates trades into OHLC bars and delegates CVD updates
+  - `update_orderbook(ob_update)`: coordinates orderbook updates and OBI metric calculation
+  - `finalize()`: finalizes both OHLC bars and metrics data
+  - `cvd_cumulative` (property): provides access to cumulative volume delta

-**Relationships**:
- `Book` contains multiple `BookSnapshot` instances
- `BookSnapshot` contains dictionaries of `OrderbookLevel` and lists of `Trade`
- `Metric` stores calculated values for each `BookSnapshot`
- `MetricCalculator` operates on snapshots to produce metrics
+#### Orderbook Management (`orderbook_manager.py`)
+- `OrderbookManager`: Handles in-memory orderbook state with partial updates
+  - Maintains separate bid/ask price→size dictionaries
+  - Supports deletions via zero-size updates
+  - Provides sorted top-N level extraction for visualization

-#### Repositories (`repositories/`)
-**Purpose**: Database access and persistence layer
+#### Metrics Calculation (`metrics_calculator.py`)
+- `MetricsCalculator`: Manages OBI and CVD metrics with windowed aggregation
+  - Tracks CVD from trade flow (buy vs sell volume delta)
+  - Calculates OBI from orderbook volume imbalance
+  - Provides throttled updates and OHLC-style metric bars

-```python
-# Repository
-SQLiteOrderflowRepository:
-  - connect()                    # Optimized SQLite connection
-  - load_trades_by_timestamp()   # Efficient trade loading
-  - iterate_book_rows()          # Memory-efficient snapshot streaming
-  - count_rows()                 # Performance monitoring
-  - create_metrics_table()       # Schema creation
-  - insert_metrics_batch()       # High-performance batch inserts
-  - load_metrics_by_timerange()  # Time-range queries
-  - table_exists()               # Schema validation
-```
+#### Level Parsing (`level_parser.py`)
+- Utility functions for normalizing orderbook level data:
+  - `normalize_levels()`: parses levels, filtering zero/negative sizes
+  - `parse_levels_including_zeros()`: preserves zeros for deletion operations
+  - Supports JSON and Python literal formats with robust error handling

-**Design Patterns**:
- **Repository Pattern**: Clean separation between data access and business logic
- **Batch Processing**: Process 1000 records per database operation
- **Connection Management**: Caller manages connection lifecycle
- **Performance Optimization**: SQLite PRAGMAs for high-speed operations
+### Inter-Process Communication (`viz_io.py`)

-### Processing Layer
+- File paths (relative to project root):
+  - `ohlc_data.json`: rolling list of OHLC bars (max 1000).
+  - `depth_data.json`: latest depth snapshot (bids/asks).
+  - `metrics_data.json`: rolling list of OBI/TOT OHLC bars (max 1000).
+- Atomic writes via temp files prevent partial reads by the Dash app.
+- API:
+  - `add_ohlc_bar(...)`: append a new bar; trim to last 1000.
+  - `upsert_ohlc_bar(...)`: replace last bar if timestamp matches; else append; trim.
+  - `clear_data()`: reset OHLC data to an empty list.

-#### Storage (`storage.py`)
-**Purpose**: Orchestrates data loading, processing, and metrics calculation
+### Visualization (`app.py`)

-```python
-class Storage:
-  - build_booktick_from_db()           # Main processing pipeline
-  - _create_snapshots_and_metrics()    # Per-snapshot processing
-  - _snapshot_from_row()               # Individual snapshot creation
-```
+- Dash application with two graphs plus OBI subplot:
+  - OHLC + Volume subplot with shared x-axis.
+  - OBI candlestick subplot (blue tones) sharing x-axis.
+  - Depth (cumulative) chart for bids and asks.
+- Polling interval (500 ms) callback reads JSON files and updates figures resilently:
+  - Caches last good values to tolerate in-flight writes/decoding errors.
+  - Builds figures with Plotly dark theme.
+- Exposed on `http://localhost:8050` by default (`host=0.0.0.0`).

-**Processing Pipeline**:
-1. **Initialize**: Create metrics repository and table if needed
-2. **Load Trades**: Group trades by timestamp for efficient access
-3. **Stream Processing**: Process snapshots one-by-one to minimize memory
-4. **Calculate Metrics**: OBI and CVD calculation per snapshot
-5. **Batch Persistence**: Store metrics in batches of 1000
-6. **Memory Management**: Discard full snapshots after metric extraction
+### CLI Orchestration (`main.py`)

-#### Strategy Framework (`strategies.py`)
-**Purpose**: Trading analysis and signal generation
-
-```python
-class DefaultStrategy:
-  - set_db_path()              # Configure database access
-  - compute_OBI()              # Real-time OBI calculation (fallback)
-  - load_stored_metrics()      # Retrieve persisted metrics
-  - get_metrics_summary()      # Statistical analysis
-  - on_booktick()             # Main analysis entry point
-```
-
-**Analysis Capabilities**:
- **Stored Metrics**: Primary analysis using persisted data
- **Real-time Fallback**: Live calculation for compatibility
- **Statistical Summaries**: Min/max/average OBI, CVD changes
- **Alert System**: Configurable thresholds for significant imbalances
-
-### Presentation Layer
-
-#### Visualization (`visualizer.py`)
-**Purpose**: Multi-chart rendering and display
-
-```python
-class Visualizer:
-  - set_db_path()              # Configure metrics access
-  - update_from_book()         # Main rendering pipeline
-  - _load_stored_metrics()     # Retrieve metrics for chart range
-  - _draw()                    # Multi-subplot rendering
-  - show()                     # Display interactive charts
-```
-
-**Chart Layout**:
-```
-┌─────────────────────────────────────┐
-│            OHLC Candlesticks        │  ← Price action
-├─────────────────────────────────────┤
-│              Volume Bars            │  ← Trading volume
-├─────────────────────────────────────┤
-│          OBI Line Chart             │  ← Order book imbalance
-├─────────────────────────────────────┤
-│          CVD Line Chart             │  ← Cumulative volume delta
-└─────────────────────────────────────┘
-```
-
-**Features**:
- **Shared Time Axis**: Synchronized X-axis across all subplots
- **Auto-scaling**: Y-axis optimization for each metric type
- **Performance**: Efficient rendering of large datasets
- **Interactive**: Qt5Agg backend for zooming and panning
+- Typer CLI entrypoint:
+  - Arguments: `instrument`, `start_date`, `end_date` (UTC, `YYYY-MM-DD`), options: `--window-seconds`.
+  - Discovers SQLite files under `../data/OKX` matching the instrument.
+  - Launches Dash visualizer as a separate process: `uv run python app.py`.
+  - Streams databases sequentially: for each book row, processes trades and updates orderbook.

 ## Data Flow

-### Processing Flow
-```
-1. SQLite DB → Repository → Raw Data
-2. Raw Data → Storage → BookSnapshot
-3. BookSnapshot → MetricCalculator → OBI/CVD
-4. Metrics → Repository → Database Storage
-5. Stored Metrics → Strategy → Analysis
-6. Stored Metrics → Visualizer → Charts
-```
+1. Discover and open SQLite database(s) for the requested instrument.
+2. Stream `book` rows with one-row lookahead to form time windows.
+3. Stream `trades` in timestamp order and bucket into the active window.
+4. For each window:
+   - Aggregate trades into OHLC using `OHLCProcessor.process_trades`.
+   - Apply partial depth updates via `OHLCProcessor.update_orderbook` and emit periodic snapshots.
+5. Persist current OHLC bar(s) and depth snapshots to JSON via atomic writes.
+6. Dash app polls JSON and renders charts.

-### Memory Management Flow
-```
-Traditional: DB → All Snapshots in Memory → Analysis (High Memory)
-Optimized:   DB → Process Snapshot → Calculate Metrics → Store → Discard (Low Memory)
-```
+## IPC JSON Schemas

-## Database Schema
+- OHLC (`ohlc_data.json`): array of bars; each bar is `[ts, open, high, low, close, volume]`.

-### Input Schema (Required)
-```sql
-- Orderbook snapshots
-CREATE TABLE book (
-    id INTEGER PRIMARY KEY,
-    instrument TEXT,
-    bids TEXT,              -- JSON: [[price, size, liq_count, order_count], ...]
-    asks TEXT,              -- JSON: [[price, size, liq_count, order_count], ...]
-    timestamp TEXT
-);
+- Depth (`depth_data.json`): object with bids/asks arrays: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`.

-- Trade executions  
-CREATE TABLE trades (
-    id INTEGER PRIMARY KEY,
-    instrument TEXT,
-    trade_id TEXT,
-    price REAL,
-    size REAL,
-    side TEXT,              -- "buy" or "sell"
-    timestamp TEXT
-);
-```
+- Metrics (`metrics_data.json`): array of bars; each bar is `[ts, obi_open, obi_high, obi_low, obi_close, tot_open, tot_high, tot_low, tot_close]`.

-### Output Schema (Auto-created)
-```sql
-- Calculated metrics
-CREATE TABLE metrics (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    snapshot_id INTEGER,
-    timestamp TEXT,
-    obi REAL,               -- Order Book Imbalance [-1, 1]
-    cvd REAL,               -- Cumulative Volume Delta
-    best_bid REAL,
-    best_ask REAL,
-    FOREIGN KEY (snapshot_id) REFERENCES book(id)
-);
+## Configuration

-- Performance indexes
-CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
-CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
-```
+- `OHLCProcessor(window_seconds, depth_levels_per_side)` controls aggregation granularity and depth snapshot size.
+- Visualizer interval (`500 ms`) balances UI responsiveness and CPU usage.
+- Paths: JSON files (`ohlc_data.json`, `depth_data.json`) are colocated with the code and written atomically.
+- CLI parameters select instrument and time range; databases expected under `../data/OKX`.

 ## Performance Characteristics

-### Memory Optimization
- **Before**: Store all snapshots in memory (~1GB for 600K snapshots)
- **After**: Store only metrics data (~300MB for same dataset)
- **Reduction**: >70% memory usage decrease
+- Read-only SQLite tuned for fast sequential scans: immutable URI, query_only, large mmap and cache.
+- Batching minimizes cursor churn and Python overhead.
+- JSON IPC uses atomic replace to avoid contention; OHLC list is bounded to 1000 entries.
+- Processor throttles intra-window OHLC upserts and depth emissions to reduce I/O.

-### Processing Performance
- **Batch Size**: 1000 records per database operation
- **Processing Speed**: ~1000 snapshots/second on modern hardware
- **Database Overhead**: <20% storage increase for metrics table
- **Query Performance**: Sub-second retrieval for typical time ranges
+## Error Handling

-### Scalability Limits
- **Single File**: 1M+ snapshots per database file
- **Time Range**: Months to years of historical data
- **Memory Peak**: <2GB for year-long datasets
- **Disk Space**: Original size + 20% for metrics
-
-## Integration Points
-
-### External Interfaces
-```python
-# Main application entry point
-main.py:
-  - CLI argument parsing
-  - Database file discovery
-  - Component orchestration
-  - Progress monitoring
-
-# Plugin interfaces
-Strategy.on_booktick(book: Book)     # Strategy integration point
-Visualizer.update_from_book(book)    # Visualization integration
-```
-
-### Internal Interfaces
-```python
-# Repository interfaces
-Repository.connect() → Connection
-Repository.load_data() → TypedData
-Repository.store_data(data) → None
-
-# Calculator interfaces
-MetricCalculator.calculate_obi(snapshot) → float
-MetricCalculator.calculate_cvd(prev_cvd, trades) → float
-```
+- Visualizer tolerates JSON decode races by reusing last good values and logging warnings.
+- Processor guards depth parsing and writes; logs at debug/info levels.
+- Visualizer startup is wrapped; if it fails, processing continues without UI.

 ## Security Considerations

-### Data Protection
- **SQL Injection**: All queries use parameterized statements
- **File Access**: Validates database file paths and permissions
- **Error Handling**: No sensitive data in error messages
- **Input Validation**: Sanitizes all external inputs
+- SQLite connections are read-only and immutable; no write queries executed.
+- File writes are confined to project directory; no paths derived from untrusted input.
+- Logs avoid sensitive data; only operational metadata.

-### Access Control
- **Database**: Respects file system permissions
- **Memory**: No sensitive data persistence beyond processing
- **Logging**: Configurable log levels without data exposure
+## Testing Guidance

-## Configuration Management
+- Unit tests (run with `uv run pytest`):
+  - `OHLCProcessor`: window boundary handling, high/low tracking, volume accumulation, upsert behavior.
+  - Depth maintenance: deletions (size==0), top-N sorting, throttling.
+  - `DBInterpreter.stream`: correct trade-window assignment, end-of-stream handling.
+- Integration: end-to-end generation of JSON from a tiny fixture DB and basic figure construction without launching a server.

-### Performance Tuning
-```python
-# Storage configuration
-BATCH_SIZE = 1000           # Records per database operation
-LOG_FREQUENCY = 20          # Progress reports per processing run
+## Roadmap (Optional Enhancements)

-# SQLite optimization
-PRAGMA journal_mode = OFF   # Maximum write performance
-PRAGMA synchronous = OFF    # Disable synchronous writes
-PRAGMA cache_size = 100000  # Large memory cache
-```
-
-### Visualization Settings
-```python
-# Chart configuration
-WINDOW_SECONDS = 60         # OHLC aggregation window
-MAX_BARS = 500             # Maximum bars displayed
-FIGURE_SIZE = (12, 10)     # Chart dimensions
-```
-
-## Error Handling Strategy
-
-### Graceful Degradation
- **Database Errors**: Continue with reduced functionality
- **Calculation Errors**: Skip problematic snapshots with logging
- **Visualization Errors**: Display available data, note issues
- **Memory Pressure**: Adjust batch sizes automatically
-
-### Recovery Mechanisms
- **Partial Processing**: Resume from last successful batch
- **Data Validation**: Verify metrics calculations before storage
- **Rollback Support**: Transaction boundaries for data consistency
+- Metrics: add OBI/CVD computation and persist metrics to a dedicated table.
+- Repository Pattern: extract DB access into a repository module with typed methods.
+- Orchestrator: introduce a `Storage` pipeline module coordinating batch processing and persistence.
+- Strategy Layer: compute signals/alerts on stored metrics.
+- Visualization: add OBI/CVD subplots and richer interactions.

 ---

-This architecture provides a robust, scalable foundation for high-frequency trading data analysis while maintaining clean separation of concerns and efficient resource utilization.
+This document reflects the current implementation centered on SQLite streaming, JSON-based IPC, and a Dash visualizer, providing a clear foundation for incremental enhancements.
--- a/docs/decisions/ADR-001-metrics-storage.md
+++ b/docs/decisions/ADR-001-metrics-storage.md
@@ -1,120 +0,0 @@
-# ADR-001: Persistent Metrics Storage
-
-## Status
-Accepted
-
-## Context
-The original orderflow backtest system kept all orderbook snapshots in memory during processing, leading to excessive memory usage (>1GB for typical datasets). With the addition of OBI and CVD metrics calculation, we needed to decide how to handle the computed metrics and manage memory efficiently.
-
-## Decision
-We will implement persistent storage of calculated metrics in the SQLite database with the following approach:
-
-1. **Metrics Table**: Create a dedicated `metrics` table to store OBI, CVD, and related data
-2. **Streaming Processing**: Process snapshots one-by-one, calculate metrics, store results, then discard snapshots
-3. **Batch Operations**: Use batch inserts (1000 records) for optimal database performance
-4. **Query Interface**: Provide time-range queries for metrics retrieval and analysis
-
-## Consequences
-
-### Positive
- **Memory Reduction**: >70% reduction in peak memory usage during processing
- **Avoid Recalculation**: Metrics calculated once and reused for multiple analysis runs
- **Scalability**: Can process months/years of data without memory constraints
- **Performance**: Batch database operations provide high throughput
- **Persistence**: Metrics survive between application runs
- **Analysis Ready**: Stored metrics enable complex time-series analysis
-
-### Negative
- **Storage Overhead**: Metrics table adds ~20% to database size
- **Complexity**: Additional database schema and management code
- **Dependencies**: Tighter coupling between processing and database layer
- **Migration**: Existing databases need schema updates for metrics table
-
-## Alternatives Considered
-
-### Option 1: Keep All Snapshots in Memory
-**Rejected**: Unsustainable memory usage for large datasets. Would limit analysis to small time ranges.
-
-### Option 2: Calculate Metrics On-Demand
-**Rejected**: Recalculating metrics for every analysis run is computationally expensive and time-consuming.
-
-### Option 3: External Metrics Database
-**Rejected**: Adds deployment complexity. SQLite co-location provides better performance and simpler management.
-
-### Option 4: Compressed In-Memory Cache
-**Rejected**: Still faces fundamental memory scaling issues. Compression/decompression adds CPU overhead.
-
-## Implementation Details
-
-### Database Schema
-```sql
-CREATE TABLE metrics (
-    id INTEGER PRIMARY KEY AUTOINCREMENT,
-    snapshot_id INTEGER NOT NULL,
-    timestamp TEXT NOT NULL,
-    obi REAL NOT NULL,
-    cvd REAL NOT NULL,
-    best_bid REAL,
-    best_ask REAL,
-    FOREIGN KEY (snapshot_id) REFERENCES book(id)
-);
-
-CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
-CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
-```
-
-### Processing Pipeline
-1. Create metrics table if not exists
-2. Stream through orderbook snapshots
-3. For each snapshot:
-   - Calculate OBI and CVD metrics
-   - Batch store metrics (1000 records per commit)
-   - Discard snapshot from memory
-4. Provide query interface for time-range retrieval
-
-### Memory Management
- **Before**: Store all snapshots → Calculate on demand → High memory usage
- **After**: Stream snapshots → Calculate immediately → Store metrics → Low memory usage
-
-## Migration Strategy
-
-### Backward Compatibility
- Existing databases continue to work without metrics table
- System auto-creates metrics table on first processing run
- Fallback to real-time calculation if metrics unavailable
-
-### Performance Impact
- **Processing Time**: Slight increase due to database writes (~10%)
- **Query Performance**: Significant improvement for repeated analysis
- **Overall**: Net positive performance for typical usage patterns
-
-## Monitoring and Validation
-
-### Success Metrics
- **Memory Usage**: Target >70% reduction in peak memory usage
- **Processing Speed**: Maintain >500 snapshots/second processing rate  
- **Storage Efficiency**: Metrics table <25% of total database size
- **Query Performance**: <1 second retrieval for typical time ranges
-
-### Validation Methods
- Memory profiling during large dataset processing
- Performance benchmarks vs. original system
- Storage overhead analysis across different dataset sizes
- Query performance testing with various time ranges
-
-## Future Considerations
-
-### Potential Enhancements
- **Compression**: Consider compression for metrics storage if overhead becomes significant
- **Partitioning**: Time-based partitioning for very large datasets
- **Caching**: In-memory cache for frequently accessed metrics
- **Export**: Direct export capabilities for external analysis tools
-
-### Scalability Options
- **Database Upgrade**: PostgreSQL if SQLite becomes limiting factor
- **Parallel Processing**: Multi-threaded metrics calculation
- **Distributed Storage**: For institutional-scale datasets
-
---
-
-This decision provides a solid foundation for efficient, scalable metrics processing while maintaining simplicity and performance characteristics suitable for the target use cases.
--- a/docs/decisions/ADR-001-sqlite-database-choice.md
+++ b/docs/decisions/ADR-001-sqlite-database-choice.md
@@ -0,0 +1,122 @@
+# ADR-001: SQLite Database Choice
+
+## Status
+Accepted
+
+## Context
+The orderflow backtest system needs to efficiently store and stream large volumes of historical orderbook and trade data. Key requirements include:
+
+- Fast sequential read access for time-series data
+- Minimal setup and maintenance overhead
+- Support for concurrent reads from visualization layer
+- Ability to handle databases ranging from 100MB to 10GB+
+- No network dependencies for data access
+
+## Decision
+We will use SQLite as the primary database for storing historical orderbook and trade data.
+
+## Consequences
+
+### Positive
+- **Zero configuration**: No database server setup or administration required
+- **Excellent read performance**: Optimized for sequential scans with proper PRAGMA settings
+- **Built-in Python support**: No external dependencies or connection libraries needed
+- **File portability**: Database files can be easily shared and archived
+- **ACID compliance**: Ensures data integrity during writes (for data ingestion)
+- **Small footprint**: Minimal memory and storage overhead
+- **Fast startup**: No connection pooling or server initialization delays
+
+### Negative
+- **Single writer limitation**: Cannot handle concurrent writes (acceptable for read-only backtest)
+- **Limited scalability**: Not suitable for high-concurrency production trading systems
+- **No network access**: Cannot query databases remotely (acceptable for local analysis)
+- **File locking**: Potential issues with file system sharing (mitigated by read-only access)
+
+## Implementation Details
+
+### Schema Design
+```sql
+-- Orderbook snapshots with timestamp windows
+CREATE TABLE book (
+    id INTEGER PRIMARY KEY,
+    instrument TEXT,
+    bids TEXT NOT NULL,        -- JSON array of [price, size] pairs
+    asks TEXT NOT NULL,        -- JSON array of [price, size] pairs
+    timestamp TEXT NOT NULL
+);
+
+-- Individual trade records
+CREATE TABLE trades (
+    id INTEGER PRIMARY KEY,
+    instrument TEXT,
+    trade_id TEXT,
+    price REAL NOT NULL,
+    size REAL NOT NULL,
+    side TEXT NOT NULL,        -- "buy" or "sell"
+    timestamp TEXT NOT NULL
+);
+
+-- Indexes for efficient time-based queries
+CREATE INDEX idx_book_timestamp ON book(timestamp);
+CREATE INDEX idx_trades_timestamp ON trades(timestamp);
+```
+
+### Performance Optimizations
+```python
+# Read-only connection with optimized PRAGMA settings
+connection_uri = f"file:{db_path}?immutable=1&mode=ro"
+conn = sqlite3.connect(connection_uri, uri=True)
+conn.execute("PRAGMA query_only = 1")
+conn.execute("PRAGMA temp_store = MEMORY")
+conn.execute("PRAGMA mmap_size = 268435456")  # 256MB
+conn.execute("PRAGMA cache_size = 10000")
+```
+
+## Alternatives Considered
+
+### PostgreSQL
+- **Rejected**: Requires server setup and maintenance
+- **Pros**: Better concurrent access, richer query features
+- **Cons**: Overkill for read-only use case, deployment complexity
+
+### Parquet Files
+- **Rejected**: Limited query capabilities for time-series data
+- **Pros**: Excellent compression, columnar format
+- **Cons**: No indexes, complex range queries, requires additional libraries
+
+### MongoDB
+- **Rejected**: Document structure not optimal for time-series data
+- **Pros**: Flexible schema, good aggregation pipeline
+- **Cons**: Requires server, higher memory usage, learning curve
+
+### CSV Files
+- **Rejected**: Poor query performance for large datasets
+- **Pros**: Simple format, universal compatibility
+- **Cons**: No indexing, slow filtering, type conversion overhead
+
+### InfluxDB
+- **Rejected**: Overkill for historical data analysis
+- **Pros**: Optimized for time-series, good compression
+- **Cons**: Additional service dependency, learning curve
+
+## Migration Path
+If scalability becomes an issue in the future:
+
+1. **Phase 1**: Implement database abstraction layer in `db_interpreter`
+2. **Phase 2**: Add PostgreSQL adapter for production workloads
+3. **Phase 3**: Implement data partitioning for very large datasets
+4. **Phase 4**: Consider distributed storage for multi-terabyte datasets
+
+## Monitoring
+Track the following metrics to validate this decision:
+- Database file sizes and growth rates
+- Query performance for different date ranges
+- Memory usage during streaming operations
+- Time to process complete backtests
+
+## Review Date
+This decision should be reviewed if:
+- Database files consistently exceed 50GB
+- Query performance degrades below 1000 rows/second
+- Concurrent access requirements change
+- Network-based data sharing becomes necessary
--- a/docs/decisions/ADR-002-json-ipc-communication.md
+++ b/docs/decisions/ADR-002-json-ipc-communication.md
@@ -0,0 +1,162 @@
+# ADR-002: JSON File-Based Inter-Process Communication
+
+## Status
+Accepted
+
+## Context
+The orderflow backtest system requires communication between the data processing pipeline and the web-based visualization frontend. Key requirements include:
+
+- Real-time data updates from processor to visualization
+- Tolerance for timing mismatches between writer and reader
+- Simple implementation without external dependencies
+- Support for different update frequencies (OHLC bars vs. orderbook depth)
+- Graceful handling of process crashes or restarts
+
+## Decision
+We will use JSON files with atomic write operations for inter-process communication between the data processor and Dash visualization frontend.
+
+## Consequences
+
+### Positive
+- **Simplicity**: No message queues, sockets, or complex protocols
+- **Fault tolerance**: File-based communication survives process restarts
+- **Debugging friendly**: Data files can be inspected manually
+- **No dependencies**: Built-in JSON support, no external libraries
+- **Atomic operations**: Temp file + rename prevents partial reads
+- **Language agnostic**: Any process can read/write JSON files
+- **Bounded memory**: Rolling data windows prevent unlimited growth
+
+### Negative
+- **File I/O overhead**: Disk writes may be slower than in-memory communication
+- **Polling required**: Reader must poll for updates (500ms interval)
+- **Limited throughput**: Not suitable for high-frequency (microsecond) updates
+- **No acknowledgments**: Writer cannot confirm reader has processed data
+- **File system dependency**: Performance varies by storage type
+
+## Implementation Details
+
+### File Structure
+```
+ohlc_data.json     # Rolling array of OHLC bars (max 1000)
+depth_data.json    # Current orderbook depth snapshot
+metrics_data.json  # Rolling array of OBI/CVD metrics (max 1000)
+```
+
+### Atomic Write Pattern
+```python
+def atomic_write(file_path: Path, data: Any) -> None:
+    """Write data atomically to prevent partial reads."""
+    temp_path = file_path.with_suffix('.tmp')
+    with open(temp_path, 'w') as f:
+        json.dump(data, f)
+        f.flush()
+        os.fsync(f.fileno())
+    temp_path.replace(file_path)  # Atomic on POSIX systems
+```
+
+### Data Formats
+```python
+# OHLC format: [timestamp_ms, open, high, low, close, volume]
+ohlc_data = [
+    [1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
+    [1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
+]
+
+# Depth format: top-N levels per side
+depth_data = {
+    "bids": [[49990.0, 1.5], [49985.0, 2.1]],
+    "asks": [[50010.0, 1.2], [50015.0, 1.8]]
+}
+
+# Metrics format: [timestamp_ms, obi_open, obi_high, obi_low, obi_close]
+metrics_data = [
+    [1640995200000, 0.15, 0.22, 0.08, 0.18],
+    [1640995260000, 0.18, 0.25, 0.12, 0.20]
+]
+```
+
+### Error Handling
+```python
+# Reader pattern with graceful fallback
+try:
+    with open(data_file) as f:
+        new_data = json.load(f)
+    _LAST_DATA = new_data  # Cache successful read
+except (FileNotFoundError, json.JSONDecodeError) as e:
+    logging.warning(f"Using cached data: {e}")
+    new_data = _LAST_DATA  # Use cached data
+```
+
+## Performance Characteristics
+
+### Write Performance
+- **Small files**: < 1MB typical, writes complete in < 10ms
+- **Atomic operations**: Add ~2-5ms overhead for temp file creation
+- **Throttling**: Updates limited to prevent excessive I/O
+
+### Read Performance
+- **Parse time**: < 5ms for typical JSON file sizes
+- **Polling overhead**: 500ms interval balances responsiveness and CPU usage
+- **Error recovery**: Cached data eliminates visual glitches
+
+### Memory Usage
+- **Bounded datasets**: Max 1000 bars × 6 fields × 8 bytes = ~48KB per file
+- **JSON overhead**: ~2x memory during parsing
+- **Total footprint**: < 500KB for all IPC data
+
+## Alternatives Considered
+
+### Redis Pub/Sub
+- **Rejected**: Additional service dependency, overkill for simple use case
+- **Pros**: True real-time updates, built-in data structures
+- **Cons**: External dependency, memory overhead, configuration complexity
+
+### ZeroMQ
+- **Rejected**: Additional library dependency, more complex than needed
+- **Pros**: High performance, flexible patterns
+- **Cons**: Learning curve, binary dependency, networking complexity
+
+### Named Pipes/Unix Sockets
+- **Rejected**: Platform-specific, more complex error handling
+- **Pros**: Better performance, no file I/O
+- **Cons**: Platform limitations, harder debugging, process lifetime coupling
+
+### SQLite as Message Queue
+- **Rejected**: Overkill for simple data exchange
+- **Pros**: ACID transactions, complex queries possible
+- **Cons**: Schema management, locking considerations, overhead
+
+### HTTP API
+- **Rejected**: Too much overhead for local communication
+- **Pros**: Standard protocol, language agnostic
+- **Cons**: Network stack overhead, port management, authentication
+
+## Future Considerations
+
+### Scalability Limits
+Current approach suitable for:
+- Update frequencies: 1-10 Hz
+- Data volumes: < 10MB total
+- Process counts: 1 writer, few readers
+
+### Migration Path
+If performance becomes insufficient:
+1. **Phase 1**: Add compression (gzip) to reduce I/O
+2. **Phase 2**: Implement shared memory for high-frequency data
+3. **Phase 3**: Consider message queue for complex routing
+4. **Phase 4**: Migrate to streaming protocol for real-time requirements
+
+## Monitoring
+Track these metrics to validate the approach:
+- File write latency and frequency
+- JSON parse times in visualization
+- Error rates for partial reads
+- Memory usage growth over time
+
+## Review Triggers
+Reconsider this decision if:
+- Update frequency requirements exceed 10 Hz
+- File I/O becomes a performance bottleneck
+- Multiple visualization clients need the same data
+- Complex message routing becomes necessary
+- Platform portability becomes a concern
--- a/docs/decisions/ADR-002-visualization-separation.md
+++ b/docs/decisions/ADR-002-visualization-separation.md
@@ -1,217 +0,0 @@
-# ADR-002: Separation of Visualization from Strategy
-
-## Status
-Accepted
-
-## Context
-The original system embedded visualization functionality within the `DefaultStrategy` class, creating tight coupling between trading analysis logic and chart rendering. This design had several issues:
-
-1. **Mixed Responsibilities**: Strategy classes handled both trading logic and GUI operations
-2. **Testing Complexity**: Strategy tests required mocking GUI components
-3. **Deployment Flexibility**: Strategies couldn't run in headless environments
-4. **Timing Control**: Visualization timing was tied to strategy execution rather than application flow
-
-The user specifically requested to display visualizations after processing each database file, requiring better control over visualization timing.
-
-## Decision
-We will separate visualization from strategy components with the following architecture:
-
-1. **Remove Visualization from Strategy**: Strategy classes focus solely on trading analysis
-2. **Main Application Control**: `main.py` orchestrates visualization timing and updates
-3. **Independent Configuration**: Strategy and Visualizer get database paths independently
-4. **Clean Interfaces**: No direct dependencies between strategy and visualization components
-
-## Consequences
-
-### Positive
- **Single Responsibility**: Strategy focuses on trading logic, Visualizer on charts
- **Better Testability**: Strategy tests run without GUI dependencies
- **Flexible Deployment**: Strategies can run in headless/server environments
- **Timing Control**: Visualization updates precisely when needed (after each DB)
- **Maintainability**: Changes to visualization don't affect strategy logic
- **Performance**: No GUI overhead during strategy analysis
-
-### Negative
- **Increased Complexity**: Main application handles more orchestration logic
- **Coordination Required**: Must ensure strategy and visualizer get same database path
- **Breaking Change**: Existing strategy initialization code needs updates
-
-## Alternatives Considered
-
-### Option 1: Keep Visualization in Strategy
-**Rejected**: Violates single responsibility principle. Makes testing difficult and deployment inflexible.
-
-### Option 2: Observer Pattern
-**Rejected**: Adds unnecessary complexity for this use case. Direct control in main.py is simpler and more explicit.
-
-### Option 3: Visualization Service
-**Rejected**: Over-engineering for current requirements. May be considered for future multi-strategy scenarios.
-
-## Implementation Details
-
-### Before (Coupled Design)
-```python
-class DefaultStrategy:
-    def __init__(self, instrument: str, enable_visualization: bool = True):
-        self.visualizer = Visualizer(...) if enable_visualization else None
-        
-    def on_booktick(self, book: Book):
-        # Trading analysis
-        # ...
-        # Visualization update
-        if self.visualizer:
-            self.visualizer.update_from_book(book)
-```
-
-### After (Separated Design)
-```python
-# Strategy focuses on analysis only
-class DefaultStrategy:
-    def __init__(self, instrument: str):
-        # No visualization dependencies
-        
-    def on_booktick(self, book: Book):
-        # Pure trading analysis
-        # No visualization code
-
-# Main application orchestrates both
-def main():
-    strategy = DefaultStrategy(instrument)
-    visualizer = Visualizer(...)
-    
-    for db_path in db_paths:
-        strategy.set_db_path(db_path)
-        visualizer.set_db_path(db_path)
-        
-        # Process data
-        storage.build_booktick_from_db(db_path, db_date)
-        
-        # Analysis
-        strategy.on_booktick(storage.book)
-        
-        # Visualization (controlled timing)
-        visualizer.update_from_book(storage.book)
-    
-    # Final display
-    visualizer.show()
-```
-
-### Interface Changes
-
-#### Strategy Interface (Simplified)
-```python
-class DefaultStrategy:
-    def __init__(self, instrument: str)                    # Removed visualization param
-    def set_db_path(self, db_path: Path) -> None           # No visualizer.set_db_path()
-    def on_booktick(self, book: Book) -> None              # No visualization calls
-```
-
-#### Main Application (Enhanced)
-```python
-def main():
-    # Separate initialization
-    strategy = DefaultStrategy(instrument)
-    visualizer = Visualizer(window_seconds=60, max_bars=500)
-    
-    # Independent configuration
-    for db_path in db_paths:
-        strategy.set_db_path(db_path)
-        visualizer.set_db_path(db_path)
-        
-        # Controlled execution
-        strategy.on_booktick(storage.book)        # Analysis
-        visualizer.update_from_book(storage.book) # Visualization
-```
-
-## Migration Strategy
-
-### Code Changes Required
-1. **Strategy Classes**: Remove visualization initialization and calls
-2. **Main Application**: Add visualizer creation and orchestration
-3. **Tests**: Update strategy tests to remove visualization mocking
-4. **Configuration**: Remove visualization parameters from strategy constructors
-
-### Backward Compatibility
- **API Breaking**: Strategy constructor signature changes
- **Functionality Preserved**: All visualization features remain available
- **Test Updates**: Strategy tests become simpler (no GUI mocking needed)
-
-### Migration Steps
-1. Update `DefaultStrategy` to remove visualization dependencies
-2. Modify `main.py` to create and manage `Visualizer` instance
-3. Update all strategy constructor calls to remove `enable_visualization`
-4. Update tests to reflect new interfaces
-5. Verify visualization timing meets requirements
-
-## Benefits Achieved
-
-### Clean Architecture
- **Strategy**: Pure trading analysis logic
- **Visualizer**: Pure chart rendering logic  
- **Main**: Application flow and component coordination
-
-### Improved Testing
-```python
-# Before: Complex mocking required
-def test_strategy():
-    with patch('visualizer.Visualizer') as mock_viz:
-        strategy = DefaultStrategy("BTC", enable_visualization=True)
-        # Complex mock setup...
-
-# After: Simple, direct testing
-def test_strategy():
-    strategy = DefaultStrategy("BTC")
-    # Direct testing of analysis logic
-```
-
-### Flexible Deployment
-```python
-# Headless server deployment
-strategy = DefaultStrategy("BTC")
-# No GUI dependencies, can run anywhere
-
-# Development with visualization
-strategy = DefaultStrategy("BTC")
-visualizer = Visualizer(...)
-# Full GUI functionality when needed
-```
-
-### Precise Timing Control
-```python
-# Visualization updates exactly when requested
-for db_file in database_files:
-    process_database(db_file)           # Data processing
-    strategy.analyze(book)              # Trading analysis  
-    visualizer.update_from_book(book)   # Chart update after each DB
-```
-
-## Monitoring and Validation
-
-### Success Criteria
- **Test Simplification**: Strategy tests run without GUI mocking
- **Timing Accuracy**: Visualization updates after each database as requested
- **Performance**: No GUI overhead during pure analysis operations
- **Maintainability**: Visualization changes don't affect strategy code
-
-### Validation Methods
- Run strategy tests in headless environment
- Verify visualization timing matches requirements
- Performance comparison of analysis-only vs. GUI operations
- Code complexity metrics for strategy vs. visualization modules
-
-## Future Considerations
-
-### Potential Enhancements
- **Multiple Visualizers**: Support different chart types or windows
- **Visualization Plugins**: Pluggable chart renderers for different outputs
- **Remote Visualization**: Web-based charts for server deployments
- **Batch Visualization**: Process multiple databases before chart updates
-
-### Extensibility
- **Strategy Plugins**: Easy to add strategies without visualization concerns
- **Visualization Backends**: Swap chart libraries without affecting strategies
- **Analysis Pipeline**: Clear separation enables complex analysis workflows
-
---
-
-This separation provides a clean, maintainable architecture that supports the requested visualization timing while improving code quality and testability.
--- a/docs/decisions/ADR-003-dash-visualization-framework.md
+++ b/docs/decisions/ADR-003-dash-visualization-framework.md
@@ -0,0 +1,204 @@
+# ADR-003: Dash Web Framework for Visualization
+
+## Status
+Accepted
+
+## Context
+The orderflow backtest system requires a user interface for visualizing OHLC candlestick charts, volume data, orderbook depth, and derived metrics. Key requirements include:
+
+- Real-time chart updates with minimal latency
+- Professional financial data visualization capabilities
+- Support for multiple chart types (candlesticks, bars, line charts)
+- Interactive features (zooming, panning, hover details)
+- Dark theme suitable for trading applications
+- Python-native solution to avoid JavaScript development
+
+## Decision
+We will use Dash (by Plotly) as the web framework for building the visualization frontend, with Plotly.js for chart rendering.
+
+## Consequences
+
+### Positive
+- **Python-native**: No JavaScript development required
+- **Plotly integration**: Best-in-class financial charting capabilities
+- **Reactive architecture**: Automatic UI updates via callback system
+- **Professional appearance**: High-quality charts suitable for trading applications
+- **Interactive features**: Built-in zooming, panning, hover tooltips
+- **Responsive design**: Bootstrap integration for modern layouts
+- **Development speed**: Rapid prototyping and iteration
+- **WebGL acceleration**: Smooth performance for large datasets
+
+### Negative
+- **Performance overhead**: Heavier than custom JavaScript solutions
+- **Limited customization**: Constrained by Dash component ecosystem
+- **Single-page limitation**: Not suitable for complex multi-page applications
+- **Memory usage**: Can be heavy for resource-constrained environments
+- **Learning curve**: Callback patterns require understanding of reactive programming
+
+## Implementation Details
+
+### Application Structure
+```python
+# Main application with Bootstrap theme
+app = dash.Dash(__name__, external_stylesheets=[dbc.themes.FLATLY])
+
+# Responsive layout with 9:3 ratio for charts:depth
+app.layout = dbc.Container([
+    dbc.Row([
+        dbc.Col([  # OHLC + Volume + Metrics
+            dcc.Graph(id='ohlc-chart', style={'height': '100vh'})
+        ], width=9),
+        dbc.Col([  # Orderbook Depth
+            dcc.Graph(id='depth-chart', style={'height': '100vh'})
+        ], width=3)
+    ]),
+    dcc.Interval(id='interval-update', interval=500, n_intervals=0)
+])
+```
+
+### Chart Architecture
+```python
+# Multi-subplot chart with shared x-axis
+fig = make_subplots(
+    rows=3, cols=1,
+    row_heights=[0.6, 0.2, 0.2],  # OHLC, Volume, Metrics
+    vertical_spacing=0.02,
+    shared_xaxes=True,
+    subplot_titles=['Price', 'Volume', 'OBI Metrics']
+)
+
+# Candlestick chart with dark theme
+fig.add_trace(go.Candlestick(
+    x=timestamps, open=opens, high=highs, low=lows, close=closes,
+    increasing_line_color='#00ff00', decreasing_line_color='#ff0000'
+), row=1, col=1)
+```
+
+### Real-time Updates
+```python
+@app.callback(
+    [Output('ohlc-chart', 'figure'), Output('depth-chart', 'figure')],
+    [Input('interval-update', 'n_intervals')]
+)
+def update_charts(n_intervals):
+    # Read data from JSON files with error handling
+    # Build and return updated figures
+    return ohlc_fig, depth_fig
+```
+
+## Performance Characteristics
+
+### Update Latency
+- **Polling interval**: 500ms for near real-time updates
+- **Chart render time**: 50-200ms depending on data size
+- **Memory usage**: ~100MB for typical chart configurations
+- **Browser requirements**: Modern browser with WebGL support
+
+### Scalability Limits
+- **Data points**: Up to 10,000 candlesticks without performance issues
+- **Update frequency**: Optimal at 1-2 Hz, maximum ~10 Hz
+- **Concurrent users**: Single user design (development server)
+- **Memory growth**: Linear with data history size
+
+## Alternatives Considered
+
+### Streamlit
+- **Rejected**: Less interactive, slower updates, limited charting
+- **Pros**: Simpler programming model, good for prototypes
+- **Cons**: Poor real-time performance, limited financial chart types
+
+### Flask + Custom JavaScript
+- **Rejected**: Requires JavaScript development, more complex
+- **Pros**: Complete control, potentially better performance
+- **Cons**: Significant development overhead, maintenance burden
+
+### Jupyter Notebooks
+- **Rejected**: Not suitable for production deployment
+- **Pros**: Great for exploration and analysis
+- **Cons**: No real-time updates, not web-deployable
+
+### Bokeh
+- **Rejected**: Less mature ecosystem, fewer financial chart types
+- **Pros**: Good performance, Python-native
+- **Cons**: Smaller community, limited examples for financial data
+
+### Custom React Application
+- **Rejected**: Requires separate frontend team, complex deployment
+- **Pros**: Maximum flexibility, best performance potential
+- **Cons**: High development cost, maintenance overhead
+
+### Desktop GUI (Tkinter/PyQt)
+- **Rejected**: Not web-accessible, limited styling options
+- **Pros**: No browser dependency, good performance
+- **Cons**: Deployment complexity, poor mobile support
+
+## Configuration Options
+
+### Theme and Styling
+```python
+# Dark theme configuration
+dark_theme = {
+    'plot_bgcolor': '#000000',
+    'paper_bgcolor': '#000000',
+    'font_color': '#ffffff',
+    'grid_color': '#333333'
+}
+```
+
+### Chart Types
+- **Candlestick charts**: OHLC price data with volume
+- **Bar charts**: Volume and metrics visualization
+- **Line charts**: Cumulative depth and trend analysis
+- **Scatter plots**: Trade-by-trade analysis (future)
+
+### Interactive Features
+- **Zoom and pan**: Time-based navigation
+- **Hover tooltips**: Detailed data on mouse over
+- **Crosshairs**: Precise value reading
+- **Range selector**: Quick time period selection
+
+## Future Enhancements
+
+### Short-term (1-3 months)
+- Add range selector for time navigation
+- Implement chart annotation for significant events
+- Add export functionality for charts and data
+
+### Medium-term (3-6 months)
+- Multi-instrument support with tabs
+- Advanced indicators and overlays
+- User preference persistence
+
+### Long-term (6+ months)
+- Real-time alerts and notifications
+- Strategy backtesting visualization
+- Portfolio-level analytics
+
+## Monitoring and Metrics
+
+### Performance Monitoring
+- Chart render times and update frequencies
+- Memory usage growth over time
+- Browser compatibility and error rates
+- User interaction patterns
+
+### Quality Metrics
+- Chart accuracy compared to source data
+- Visual responsiveness during heavy updates
+- Error recovery from data corruption
+
+## Review Triggers
+Reconsider this decision if:
+- Update frequency requirements exceed 10 Hz consistently
+- Memory usage becomes prohibitive (> 1GB)
+- Custom visualization requirements cannot be met
+- Multi-user deployment becomes necessary
+- Mobile responsiveness becomes a priority
+- Integration with external charting libraries is needed
+
+## Migration Path
+If replacement becomes necessary:
+1. **Phase 1**: Abstract chart building logic from Dash specifics
+2. **Phase 2**: Implement alternative frontend while maintaining data formats
+3. **Phase 3**: A/B test performance and usability
+4. **Phase 4**: Complete migration with feature parity
--- a/docs/modules/app.md
+++ b/docs/modules/app.md
@@ -0,0 +1,165 @@
+# Module: app
+
+## Purpose
+The `app` module provides a real-time Dash web application for visualizing OHLC candlestick charts, volume data, Order Book Imbalance (OBI) metrics, and orderbook depth. It implements a polling-based architecture that reads JSON data files and renders interactive charts with a dark theme.
+
+## Public Interface
+
+### Functions
+- `build_empty_ohlc_fig() -> go.Figure`: Create empty OHLC chart with proper styling
+- `build_empty_depth_fig() -> go.Figure`: Create empty depth chart with proper styling
+- `build_ohlc_fig(data: List[list], metrics: List[list]) -> go.Figure`: Build complete OHLC+Volume+OBI chart
+- `build_depth_fig(depth_data: dict) -> go.Figure`: Build orderbook depth visualization
+
+### Global Variables
+- `_LAST_DATA`: Cached OHLC data for error recovery
+- `_LAST_DEPTH`: Cached depth data for error recovery
+- `_LAST_METRICS`: Cached metrics data for error recovery
+
+### Dash Application
+- `app`: Main Dash application instance with Bootstrap theme
+- Layout with responsive grid (9:3 ratio for OHLC:Depth charts)
+- 500ms polling interval for real-time updates
+
+## Usage Examples
+
+### Running the Application
+```bash
+# Start the Dash server
+uv run python app.py
+
+# Access the web interface
+# Open http://localhost:8050 in your browser
+```
+
+### Programmatic Usage
+```python
+from app import build_ohlc_fig, build_depth_fig
+
+# Build charts with sample data
+ohlc_data = [[1640995200000, 50000, 50100, 49900, 50050, 125.5]]
+metrics_data = [[1640995200000, 0.15, 0.22, 0.08, 0.18]]
+depth_data = {
+    "bids": [[49990, 1.5], [49985, 2.1]],
+    "asks": [[50010, 1.2], [50015, 1.8]]
+}
+
+ohlc_fig = build_ohlc_fig(ohlc_data, metrics_data)
+depth_fig = build_depth_fig(depth_data)
+```
+
+## Dependencies
+
+### Internal
+- `viz_io`: Data file paths and JSON reading
+- `viz_io.DATA_FILE`: OHLC data source
+- `viz_io.DEPTH_FILE`: Depth data source
+- `viz_io.METRICS_FILE`: Metrics data source
+
+### External
+- `dash`: Web application framework
+- `dash.html`, `dash.dcc`: HTML and core components
+- `dash_bootstrap_components`: Bootstrap styling
+- `plotly.graph_objs`: Chart objects
+- `plotly.subplots`: Multiple subplot support
+- `pandas`: Data manipulation (minimal usage)
+- `json`: JSON file parsing
+- `logging`: Error and debug logging
+- `pathlib`: File path handling
+
+## Chart Architecture
+
+### OHLC Chart (Left Panel, 9/12 width)
+- **Main subplot**: Candlestick chart with OHLC data
+- **Volume subplot**: Bar chart sharing x-axis with main chart
+- **OBI subplot**: Order Book Imbalance candlestick chart in blue tones
+- **Shared x-axis**: Synchronized zooming and panning across subplots
+
+### Depth Chart (Right Panel, 3/12 width)
+- **Cumulative depth**: Stepped line chart showing bid/ask liquidity
+- **Color coding**: Green for bids, red for asks
+- **Real-time updates**: Reflects current orderbook state
+
+## Styling and Theme
+
+### Dark Theme Configuration
+- Background: Black (`#000000`)
+- Text: White (`#ffffff`)
+- Grid: Dark gray with transparency
+- Candlesticks: Green (up) / Red (down)
+- Volume: Gray bars
+- OBI: Blue tones for candlesticks
+- Depth: Green (bids) / Red (asks)
+
+### Responsive Design
+- Bootstrap grid system for layout
+- Fluid container for full-width usage
+- 100vh height for full viewport coverage
+- Configurable chart display modes
+
+## Data Polling and Error Handling
+
+### Polling Strategy
+- **Interval**: 500ms for near real-time updates
+- **Graceful degradation**: Uses cached data on JSON read errors
+- **Atomic reads**: Tolerates partial writes during file updates
+- **Logging**: Warnings for data inconsistencies
+
+### Error Recovery
+```python
+# Pseudocode for error handling pattern
+try:
+    with open(data_file) as f:
+        new_data = json.load(f)
+    _LAST_DATA = new_data  # Cache successful read
+except (FileNotFoundError, json.JSONDecodeError):
+    logging.warning("Using cached data due to read error")
+    new_data = _LAST_DATA  # Use cached data
+```
+
+## Performance Characteristics
+
+- **Client-side rendering**: Plotly.js handles chart rendering
+- **Efficient updates**: Only redraws when data changes
+- **Memory bounded**: Limited by max bars in data files (1000)
+- **Network efficient**: Local file polling (no external API calls)
+
+## Testing
+
+Run application tests:
+```bash
+uv run pytest test_app.py -v
+```
+
+Test coverage includes:
+- Chart building functions
+- Data loading and caching
+- Error handling scenarios
+- Layout rendering
+- Callback functionality
+
+## Configuration Options
+
+### Server Configuration
+- **Host**: `0.0.0.0` (accessible from network)
+- **Port**: `8050` (default Dash port)
+- **Debug mode**: Disabled in production
+
+### Chart Configuration
+- **Update interval**: 500ms (configurable via dcc.Interval)
+- **Display mode bar**: Enabled for user interaction
+- **Logo display**: Disabled for clean interface
+
+## Known Issues
+
+- High CPU usage during rapid data updates
+- Memory usage grows with chart history
+- No authentication or access control
+- Limited mobile responsiveness for complex charts
+
+## Development Notes
+
+- Uses Flask development server (not suitable for production)
+- Callback exceptions suppressed for partial data scenarios
+- Bootstrap CSS loaded from CDN
+- Chart configurations optimized for financial data visualization
--- a/docs/modules/db_interpreter.md
+++ b/docs/modules/db_interpreter.md
@@ -0,0 +1,83 @@
+# Module: db_interpreter
+
+## Purpose
+The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing.
+
+## Public Interface
+
+### Classes
+- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook
+- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp
+
+### Functions
+- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings
+
+### Methods
+- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows
+
+## Usage Examples
+
+```python
+from pathlib import Path
+from db_interpreter import DBInterpreter
+
+# Initialize interpreter
+db_path = Path("data/BTC-USDT-2025-01-01.db")
+interpreter = DBInterpreter(db_path)
+
+# Stream orderbook and trade data
+for ob_update, trades in interpreter.stream():
+    # Process orderbook update
+    print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks")
+    print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}")
+    
+    # Process trades in this window
+    for trade in trades:
+        trade_id, price, size, side, timestamp_ms = trade[1:6]
+        print(f"Trade: {side} {size} @ {price}")
+```
+
+## Dependencies
+
+### Internal
+- None (standalone module)
+
+### External
+- `sqlite3`: Database connectivity
+- `pathlib`: Path handling
+- `dataclasses`: Data structure definitions
+- `typing`: Type annotations
+- `logging`: Debug and error logging
+
+## Performance Characteristics
+
+- **Batch sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage
+- **SQLite optimizations**: Read-only, immutable mode, large mmap and cache sizes
+- **Memory efficient**: Streaming iterator pattern prevents loading entire dataset
+- **Temporal windowing**: One-row lookahead for precise time boundary calculation
+
+## Testing
+
+Run module tests:
+```bash
+uv run pytest test_db_interpreter.py -v
+```
+
+Test coverage includes:
+- Batch reading correctness
+- Temporal window boundary handling
+- Trade-to-window assignment accuracy
+- End-of-stream behavior
+- Error handling for malformed data
+
+## Known Issues
+
+- Requires specific database schema (book and trades tables)
+- Python-literal string parsing assumes well-formed input
+- Large databases may require memory monitoring during streaming
+
+## Configuration
+
+- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048)
+- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096)
+- SQLite PRAGMA settings optimized for read-only sequential access
--- a/docs/modules/dependencies.md
+++ b/docs/modules/dependencies.md
@@ -0,0 +1,162 @@
+# External Dependencies
+
+## Overview
+This document describes all external dependencies used in the orderflow backtest system, their purposes, versions, and justifications for inclusion.
+
+## Production Dependencies
+
+### Core Framework Dependencies
+
+#### Dash (^2.18.2)
+- **Purpose**: Web application framework for interactive visualizations
+- **Usage**: Real-time chart rendering and user interface
+- **Justification**: Mature Python-based framework with excellent Plotly integration
+- **Key Features**: Reactive components, built-in server, callback system
+
+#### Dash Bootstrap Components (^1.6.0)
+- **Purpose**: Bootstrap CSS framework integration for Dash
+- **Usage**: Responsive layout grid and modern UI styling
+- **Justification**: Provides professional appearance with minimal custom CSS
+
+#### Plotly (^5.24.1)
+- **Purpose**: Interactive charting and visualization library
+- **Usage**: OHLC candlesticks, volume bars, depth charts, OBI metrics
+- **Justification**: Industry standard for financial data visualization
+- **Key Features**: WebGL acceleration, zooming/panning, dark themes
+
+### Data Processing Dependencies
+
+#### Pandas (^2.2.3)
+- **Purpose**: Data manipulation and analysis library
+- **Usage**: Minimal usage for data structure conversions in visualization
+- **Justification**: Standard tool for financial data handling
+- **Note**: Usage kept minimal to maintain performance
+
+#### Typer (^0.13.1)
+- **Purpose**: Modern CLI framework
+- **Usage**: Command-line argument parsing and help generation
+- **Justification**: Type-safe, auto-generated help, better UX than argparse
+- **Key Features**: Type hints integration, automatic validation
+
+### Data Storage Dependencies
+
+#### SQLite3 (Built-in)
+- **Purpose**: Database connectivity for historical data
+- **Usage**: Read-only access to orderbook and trade data
+- **Justification**: Built into Python, no external dependencies, excellent performance
+- **Configuration**: Optimized with immutable mode and mmap
+
+## Development and Testing Dependencies
+
+#### Pytest (^8.3.4)
+- **Purpose**: Testing framework
+- **Usage**: Unit tests, integration tests, test discovery
+- **Justification**: Standard Python testing tool with excellent plugin ecosystem
+
+#### Coverage (^7.6.9)
+- **Purpose**: Code coverage measurement
+- **Usage**: Test coverage reporting and quality metrics
+- **Justification**: Essential for maintaining code quality
+
+## Build and Package Management
+
+#### UV (Package Manager)
+- **Purpose**: Fast Python package manager and task runner
+- **Usage**: Dependency management, virtual environments, script execution
+- **Justification**: Significantly faster than pip/poetry, better lock file format
+- **Commands**: `uv sync`, `uv run`, `uv add`
+
+## Python Standard Library Usage
+
+### Core Libraries
+- **sqlite3**: Database connectivity
+- **json**: JSON serialization for IPC
+- **pathlib**: Modern file path handling
+- **subprocess**: Process management for visualization
+- **logging**: Structured logging throughout application
+- **datetime**: Date/time parsing and manipulation
+- **dataclasses**: Structured data types
+- **typing**: Type annotations and hints
+- **tempfile**: Atomic file operations
+- **ast**: Safe evaluation of Python literals
+
+### Performance Libraries
+- **itertools**: Efficient iteration patterns
+- **functools**: Function decoration and caching
+- **collections**: Specialized data structures
+
+## Dependency Justifications
+
+### Why Dash Over Alternatives?
+- **vs. Streamlit**: Better real-time updates, more control over layout
+- **vs. Flask + Custom JS**: Integrated Plotly support, faster development
+- **vs. Jupyter**: Better for production deployment, process isolation
+
+### Why SQLite Over Alternatives?
+- **vs. PostgreSQL**: No server setup required, excellent read performance
+- **vs. Parquet**: Better for time-series queries, built-in indexing
+- **vs. CSV**: Proper data types, much faster queries, atomic transactions
+
+### Why UV Over Poetry/Pip?
+- **vs. Poetry**: Significantly faster dependency resolution and installation
+- **vs. Pip**: Better dependency locking, integrated task runner
+- **vs. Pipenv**: More active development, better performance
+
+## Version Pinning Strategy
+
+### Patch Version Pinning
+- Core dependencies (Dash, Plotly) pinned to patch versions
+- Prevents breaking changes while allowing security updates
+
+### Range Pinning
+- Development tools use caret (^) ranges for flexibility
+- Testing tools can update more freely
+
+### Lock File Management
+- `uv.lock` ensures reproducible builds across environments
+- Regular updates scheduled monthly for security patches
+
+## Security Considerations
+
+### Dependency Scanning
+- Regular audit of dependencies for known vulnerabilities
+- Automated updates for security patches
+- Minimal dependency tree to reduce attack surface
+
+### Data Isolation
+- Read-only database access prevents data modification
+- No external network connections required for core functionality
+- All file operations contained within project directory
+
+## Performance Impact
+
+### Bundle Size
+- Core runtime: ~50MB with all dependencies
+- Dash frontend: Additional ~10MB for JavaScript assets
+- SQLite: Zero overhead (built-in)
+
+### Startup Time
+- Cold start: ~2-3 seconds for full application
+- UV virtual environment activation: ~100ms
+- Database connection: ~50ms per file
+
+### Memory Usage
+- Base application: ~100MB
+- Per 1000 OHLC bars: ~5MB additional
+- Plotly charts: ~20MB for complex visualizations
+
+## Maintenance Schedule
+
+### Monthly
+- Security update review and application
+- Dependency version bump evaluation
+
+### Quarterly
+- Major version update consideration
+- Performance impact assessment
+- Alternative technology evaluation
+
+### Annually
+- Complete dependency audit
+- Technology stack review
+- Migration planning for deprecated packages
--- a/docs/modules/level_parser.md
+++ b/docs/modules/level_parser.md
@@ -0,0 +1,101 @@
+# Module: level_parser
+
+## Purpose
+The `level_parser` module provides utilities for parsing and normalizing orderbook level data from various string formats. It handles JSON and Python literal representations, converting them into standardized numeric tuples for processing.
+
+## Public Interface
+
+### Functions
+- `normalize_levels(levels: Any) -> List[List[float]]`: Parse levels into [[price, size], ...] format, filtering out zero/negative sizes
+- `parse_levels_including_zeros(levels: Any) -> List[Tuple[float, float]]`: Parse levels preserving zero sizes for deletion operations
+
+### Private Functions
+- `_parse_string_to_list(levels: Any) -> List[Any]`: Core parsing logic trying JSON first, then literal_eval
+- `_extract_price_size(item: Any) -> Tuple[Any, Any]`: Extract price/size from dict or list/tuple formats
+
+## Usage Examples
+
+```python
+from level_parser import normalize_levels, parse_levels_including_zeros
+
+# Parse standard levels (filters zeros)
+levels = normalize_levels('[[50000.0, 1.5], [49999.0, 2.0]]')
+# Returns: [[50000.0, 1.5], [49999.0, 2.0]]
+
+# Parse with zero sizes preserved (for deletions)
+updates = parse_levels_including_zeros('[[50000.0, 0.0], [49999.0, 1.5]]')
+# Returns: [(50000.0, 0.0), (49999.0, 1.5)]
+
+# Supports dict format
+dict_levels = normalize_levels('[{"price": 50000.0, "size": 1.5}]')
+# Returns: [[50000.0, 1.5]]
+
+# Short key format
+short_levels = normalize_levels('[{"p": 50000.0, "s": 1.5}]')
+# Returns: [[50000.0, 1.5]]
+```
+
+## Dependencies
+
+### External
+- `json`: Primary parsing method for level data
+- `ast.literal_eval`: Fallback parsing for Python literal formats
+- `logging`: Debug logging for parsing issues
+- `typing`: Type annotations
+
+## Input Formats Supported
+
+### JSON Array Format
+```json
+[[50000.0, 1.5], [49999.0, 2.0]]
+```
+
+### Dict Format (Full Keys)
+```json
+[{"price": 50000.0, "size": 1.5}, {"price": 49999.0, "size": 2.0}]
+```
+
+### Dict Format (Short Keys)
+```json
+[{"p": 50000.0, "s": 1.5}, {"p": 49999.0, "s": 2.0}]
+```
+
+### Python Literal Format
+```python
+"[(50000.0, 1.5), (49999.0, 2.0)]"
+```
+
+## Error Handling
+
+- **Graceful Degradation**: Returns empty list on parse failures
+- **Data Validation**: Filters out invalid price/size pairs
+- **Type Safety**: Converts all values to float before processing
+- **Debug Logging**: Logs warnings for malformed input without crashing
+
+## Performance Characteristics
+
+- **Fast Path**: JSON parsing prioritized for performance
+- **Fallback Support**: ast.literal_eval as backup for edge cases
+- **Memory Efficient**: Processes items iteratively, not loading entire dataset
+- **Validation**: Minimal overhead with early filtering of invalid data
+
+## Testing
+
+```bash
+uv run pytest test_level_parser.py -v
+```
+
+Test coverage includes:
+- JSON format parsing accuracy
+- Dict format (both key styles) parsing
+- Python literal fallback parsing
+- Zero size preservation vs filtering
+- Error handling for malformed input
+- Type conversion edge cases
+
+## Known Limitations
+
+- Assumes well-formed numeric data (price/size as numbers)
+- Does not validate economic constraints (e.g., positive prices)
+- Limited to list/dict input formats
+- No support for streaming/incremental parsing
--- a/docs/modules/main.md
+++ b/docs/modules/main.md
@@ -0,0 +1,168 @@
+# Module: main
+
+## Purpose
+The `main` module provides the command-line interface (CLI) orchestration for the orderflow backtest system. It handles database discovery, process management, and coordinates the streaming pipeline with the visualization frontend using Typer for argument parsing.
+
+## Public Interface
+
+### Functions
+- `main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None`: Primary CLI entrypoint
+- `discover_databases(instrument: str, start_date: str, end_date: str) -> list[Path]`: Find matching database files
+- `launch_visualizer() -> subprocess.Popen | None`: Start Dash application in separate process
+
+### CLI Arguments
+- `instrument`: Trading pair identifier (e.g., "BTC-USDT")
+- `start_date`: Start date in YYYY-MM-DD format (UTC)
+- `end_date`: End date in YYYY-MM-DD format (UTC)
+- `--window-seconds`: OHLC aggregation window size (default: 60)
+
+## Usage Examples
+
+### Command Line Usage
+```bash
+# Basic usage with default 60-second windows
+uv run python main.py BTC-USDT 2025-01-01 2025-01-31
+
+# Custom window size
+uv run python main.py ETH-USDT 2025-02-01 2025-02-28 --window-seconds 30
+
+# Single day processing
+uv run python main.py SOL-USDT 2025-03-15 2025-03-15
+```
+
+### Programmatic Usage
+```python
+from main import main, discover_databases
+
+# Run processing pipeline
+main("BTC-USDT", "2025-01-01", "2025-01-31", window_seconds=120)
+
+# Discover available databases
+db_files = discover_databases("ETH-USDT", "2025-02-01", "2025-02-28")
+print(f"Found {len(db_files)} database files")
+```
+
+## Dependencies
+
+### Internal
+- `db_interpreter.DBInterpreter`: Database streaming
+- `ohlc_processor.OHLCProcessor`: Trade aggregation and orderbook processing
+- `viz_io`: Data clearing functions
+
+### External
+- `typer`: CLI framework and argument parsing
+- `subprocess`: Process management for visualization
+- `pathlib`: File and directory operations
+- `datetime`: Date parsing and validation
+- `logging`: Operational logging
+- `sys`: Exit code management
+
+## Database Discovery Logic
+
+### File Pattern Matching
+```python
+# Expected directory structure
+../data/OKX/{instrument}/{date}/
+
+# Example paths
+../data/OKX/BTC-USDT/2025-01-01/trades.db
+../data/OKX/ETH-USDT/2025-02-15/trades.db
+```
+
+### Discovery Algorithm
+1. Parse start and end dates to datetime objects
+2. Iterate through date range (inclusive)
+3. Construct expected path for each date
+4. Verify file existence and readability
+5. Return sorted list of valid database paths
+
+## Process Orchestration
+
+### Visualization Process Management
+```python
+# Launch Dash app in separate process
+viz_process = subprocess.Popen([
+    "uv", "run", "python", "app.py"
+], cwd=project_root)
+
+# Process management
+try:
+    # Main processing loop
+    process_databases(db_files)
+finally:
+    # Cleanup visualization process
+    if viz_process:
+        viz_process.terminate()
+        viz_process.wait(timeout=5)
+```
+
+### Data Processing Pipeline
+1. **Initialize**: Clear existing data files
+2. **Launch**: Start visualization process
+3. **Stream**: Process each database sequentially
+4. **Aggregate**: Generate OHLC bars and depth snapshots
+5. **Cleanup**: Terminate visualization and finalize
+
+## Error Handling
+
+### Database Access Errors
+- **File not found**: Log warning and skip missing databases
+- **Permission denied**: Log error and exit with status code 1
+- **Corruption**: Log error for specific database and continue with next
+
+### Process Management Errors
+- **Visualization startup failure**: Log error but continue processing
+- **Process termination**: Graceful shutdown with timeout
+- **Resource cleanup**: Ensure child processes are terminated
+
+### Date Validation
+- **Invalid format**: Clear error message with expected format
+- **Invalid range**: End date must be >= start date
+- **Future dates**: Warning for dates beyond data availability
+
+## Performance Characteristics
+
+- **Sequential processing**: Databases processed one at a time
+- **Memory efficient**: Streaming approach prevents loading entire datasets
+- **Process isolation**: Visualization runs independently
+- **Resource cleanup**: Automatic process termination on exit
+
+## Testing
+
+Run module tests:
+```bash
+uv run pytest test_main.py -v
+```
+
+Test coverage includes:
+- Database discovery logic
+- Date parsing and validation
+- Process management
+- Error handling scenarios
+- CLI argument validation
+
+## Configuration
+
+### Default Settings
+- **Data directory**: `../data/OKX` (relative to project root)
+- **Visualization command**: `uv run python app.py`
+- **Window size**: 60 seconds
+- **Process timeout**: 5 seconds for termination
+
+### Environment Variables
+- **DATA_PATH**: Override default data directory
+- **VISUALIZATION_PORT**: Override Dash port (requires app.py modification)
+
+## Known Issues
+
+- Assumes specific directory structure under `../data/OKX`
+- No validation of database schema compatibility
+- Limited error recovery for process management
+- No progress indication for large datasets
+
+## Development Notes
+
+- Uses Typer for modern CLI interface
+- Subprocess management compatible with Unix/Windows
+- Logging configured for both development and production use
+- Exit codes follow Unix conventions (0=success, 1=error)
--- a/docs/modules/metrics.md
+++ b/docs/modules/metrics.md
@@ -1,302 +0,0 @@
-# Module: Metrics Calculation System
-
-## Purpose
-
-The metrics calculation system provides high-performance computation of Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD) indicators for cryptocurrency trading analysis. It processes orderbook snapshots and trade data to generate financial metrics with per-snapshot granularity.
-
-## Public Interface
-
-### Classes
-
-#### `Metric` (dataclass)
-Represents calculated metrics for a single orderbook snapshot.
-
-```python
-@dataclass(slots=True)
-class Metric:
-    snapshot_id: int        # Reference to source snapshot
-    timestamp: int          # Unix timestamp
-    obi: float             # Order Book Imbalance [-1, 1]
-    cvd: float             # Cumulative Volume Delta
-    best_bid: float | None # Best bid price
-    best_ask: float | None # Best ask price
-```
-
-#### `MetricCalculator` (static class)
-Provides calculation methods for financial metrics.
-
-```python
-class MetricCalculator:
-    @staticmethod
-    def calculate_obi(snapshot: BookSnapshot) -> float
-    
-    @staticmethod
-    def calculate_volume_delta(trades: List[Trade]) -> float
-    
-    @staticmethod  
-    def calculate_cvd(previous_cvd: float, volume_delta: float) -> float
-    
-    @staticmethod
-    def get_best_bid_ask(snapshot: BookSnapshot) -> tuple[float | None, float | None]
-```
-
-### Functions
-
-#### Order Book Imbalance (OBI) Calculation
-```python
-def calculate_obi(snapshot: BookSnapshot) -> float:
-    """
-    Calculate Order Book Imbalance using the standard formula.
-    
-    Formula: OBI = (Vb - Va) / (Vb + Va)
-    Where:
-        Vb = Total volume on bid side
-        Va = Total volume on ask side
-    
-    Args:
-        snapshot: BookSnapshot containing bids and asks data
-        
-    Returns:
-        float: OBI value between -1 and 1, or 0.0 if no volume
-        
-    Example:
-        >>> snapshot = BookSnapshot(bids={50000.0: OrderbookLevel(...)}, ...)
-        >>> obi = MetricCalculator.calculate_obi(snapshot)
-        >>> print(f"OBI: {obi:.3f}")
-        OBI: 0.333
-    """
-```
-
-#### Volume Delta Calculation
-```python
-def calculate_volume_delta(trades: List[Trade]) -> float:
-    """
-    Calculate Volume Delta for a list of trades.
-    
-    Volume Delta = Buy Volume - Sell Volume
-    - Buy trades (side = "buy"): positive contribution
-    - Sell trades (side = "sell"): negative contribution
-    
-    Args:
-        trades: List of Trade objects for specific timestamp
-        
-    Returns:
-        float: Net volume delta (positive = buy pressure, negative = sell pressure)
-        
-    Example:
-        >>> trades = [
-        ...     Trade(side="buy", size=10.0, ...),
-        ...     Trade(side="sell", size=3.0, ...)
-        ... ]
-        >>> vd = MetricCalculator.calculate_volume_delta(trades)
-        >>> print(f"Volume Delta: {vd}")
-        Volume Delta: 7.0
-    """
-```
-
-#### Cumulative Volume Delta (CVD) Calculation
-```python
-def calculate_cvd(previous_cvd: float, volume_delta: float) -> float:
-    """
-    Calculate Cumulative Volume Delta with incremental support.
-    
-    Formula: CVD_t = CVD_{t-1} + Volume_Delta_t
-    
-    Args:
-        previous_cvd: Previous CVD value (use 0.0 for reset)
-        volume_delta: Current volume delta to add
-        
-    Returns:
-        float: New cumulative volume delta value
-        
-    Example:
-        >>> cvd = 0.0  # Starting value
-        >>> cvd = MetricCalculator.calculate_cvd(cvd, 10.0)  # First trade
-        >>> cvd = MetricCalculator.calculate_cvd(cvd, -5.0)  # Second trade
-        >>> print(f"CVD: {cvd}")
-        CVD: 5.0
-    """
-```
-
-## Usage Examples
-
-### Basic OBI Calculation
-```python
-from models import MetricCalculator, BookSnapshot, OrderbookLevel
-
-# Create sample orderbook snapshot
-snapshot = BookSnapshot(
-    id=1,
-    timestamp=1640995200,
-    bids={
-        50000.0: OrderbookLevel(price=50000.0, size=10.0, liquidation_count=0, order_count=1),
-        49999.0: OrderbookLevel(price=49999.0, size=5.0, liquidation_count=0, order_count=1),
-    },
-    asks={
-        50001.0: OrderbookLevel(price=50001.0, size=3.0, liquidation_count=0, order_count=1),
-        50002.0: OrderbookLevel(price=50002.0, size=2.0, liquidation_count=0, order_count=1),
-    }
-)
-
-# Calculate OBI
-obi = MetricCalculator.calculate_obi(snapshot)
-print(f"OBI: {obi:.3f}")  # Output: OBI: 0.500
-# Explanation: (15 - 5) / (15 + 5) = 10/20 = 0.5
-```
-
-### CVD Calculation with Reset
-```python
-from models import MetricCalculator, Trade
-
-# Simulate trading session
-cvd = 0.0  # Reset CVD at session start
-
-# Process trades for first timestamp
-trades_t1 = [
-    Trade(id=1, trade_id=1.0, price=50000.0, size=8.0, side="buy", timestamp=1000),
-    Trade(id=2, trade_id=2.0, price=50001.0, size=3.0, side="sell", timestamp=1000),
-]
-
-vd_t1 = MetricCalculator.calculate_volume_delta(trades_t1)  # 8.0 - 3.0 = 5.0
-cvd = MetricCalculator.calculate_cvd(cvd, vd_t1)           # 0.0 + 5.0 = 5.0
-
-# Process trades for second timestamp
-trades_t2 = [
-    Trade(id=3, trade_id=3.0, price=49999.0, size=2.0, side="buy", timestamp=1001),
-    Trade(id=4, trade_id=4.0, price=50000.0, size=7.0, side="sell", timestamp=1001),
-]
-
-vd_t2 = MetricCalculator.calculate_volume_delta(trades_t2)  # 2.0 - 7.0 = -5.0
-cvd = MetricCalculator.calculate_cvd(cvd, vd_t2)           # 5.0 + (-5.0) = 0.0
-
-print(f"Final CVD: {cvd}")  # Output: Final CVD: 0.0
-```
-
-### Complete Metrics Processing
-```python
-from models import MetricCalculator, Metric
-
-def process_snapshot_metrics(snapshot, trades, previous_cvd=0.0):
-    """Process complete metrics for a single snapshot."""
-    
-    # Calculate OBI
-    obi = MetricCalculator.calculate_obi(snapshot)
-    
-    # Calculate volume delta and CVD
-    volume_delta = MetricCalculator.calculate_volume_delta(trades)
-    cvd = MetricCalculator.calculate_cvd(previous_cvd, volume_delta)
-    
-    # Extract best bid/ask
-    best_bid, best_ask = MetricCalculator.get_best_bid_ask(snapshot)
-    
-    # Create metric record
-    metric = Metric(
-        snapshot_id=snapshot.id,
-        timestamp=snapshot.timestamp,
-        obi=obi,
-        cvd=cvd,
-        best_bid=best_bid,
-        best_ask=best_ask
-    )
-    
-    return metric, cvd
-
-# Usage in processing loop
-current_cvd = 0.0
-for snapshot, trades in snapshot_trade_pairs:
-    metric, current_cvd = process_snapshot_metrics(snapshot, trades, current_cvd)
-    # Store metric to database...
-```
-
-## Dependencies
-
-### Internal
- `models.BookSnapshot`: Orderbook state data
- `models.Trade`: Individual trade execution data
- `models.OrderbookLevel`: Price level information
-
-### External
- **Python Standard Library**: `typing` for type hints
- **No external packages required**
-
-## Performance Characteristics
-
-### Computational Complexity
- **OBI Calculation**: O(n) where n = number of price levels
- **Volume Delta**: O(m) where m = number of trades
- **CVD Calculation**: O(1) - simple addition
- **Best Bid/Ask**: O(n) for min/max operations
-
-### Memory Usage
- **Static Methods**: No instance state, minimal memory overhead
- **Calculations**: Process data in-place without copying
- **Results**: Lightweight `Metric` objects with slots optimization
-
-### Typical Performance
-```python
-# Benchmark results (approximate)
-Snapshot with 50 price levels:     ~0.1ms per OBI calculation
-Timestamp with 20 trades:          ~0.05ms per volume delta
-CVD update:                        ~0.001ms per calculation
-Complete metric processing:        ~0.2ms per snapshot
-```
-
-## Error Handling
-
-### Edge Cases Handled
-```python
-# Empty orderbook
-empty_snapshot = BookSnapshot(bids={}, asks={})
-obi = MetricCalculator.calculate_obi(empty_snapshot)  # Returns 0.0
-
-# No trades
-empty_trades = []
-vd = MetricCalculator.calculate_volume_delta(empty_trades)  # Returns 0.0
-
-# Zero volume scenario
-zero_vol_snapshot = BookSnapshot(
-    bids={50000.0: OrderbookLevel(price=50000.0, size=0.0, ...)},
-    asks={50001.0: OrderbookLevel(price=50001.0, size=0.0, ...)}
-)
-obi = MetricCalculator.calculate_obi(zero_vol_snapshot)  # Returns 0.0
-```
-
-### Validation
- **OBI Range**: Results automatically bounded to [-1, 1]
- **Division by Zero**: Handled gracefully with 0.0 return
- **Invalid Data**: Empty collections handled without errors
-
-## Testing
-
-### Test Coverage
- **Unit Tests**: `tests/test_metric_calculator.py`
- **Integration Tests**: Included in storage and strategy tests
- **Edge Cases**: Empty data, zero volume, boundary conditions
-
-### Running Tests
-```bash
-# Run metric calculator tests specifically
-uv run pytest tests/test_metric_calculator.py -v
-
-# Run all tests with metrics
-uv run pytest -k "metric" -v
-
-# Performance tests
-uv run pytest tests/test_metric_calculator.py::test_calculate_obi_performance
-```
-
-## Known Issues
-
-### Current Limitations
- **Precision**: Floating-point arithmetic limitations for very small numbers
- **Scale**: No optimization for extremely large orderbooks (>10k levels)
- **Currency**: No multi-currency support (assumes single denomination)
-
-### Planned Enhancements
- **Decimal Precision**: Consider `decimal.Decimal` for high-precision calculations
- **Vectorization**: NumPy integration for batch calculations
- **Additional Metrics**: Volume Profile, Liquidity metrics, Delta Flow
-
---
-
-The metrics calculation system provides a robust foundation for financial analysis with clean interfaces, comprehensive error handling, and optimal performance for high-frequency trading data.
--- a/docs/modules/metrics_calculator.md
+++ b/docs/modules/metrics_calculator.md
@@ -0,0 +1,147 @@
+# Module: metrics_calculator
+
+## Purpose
+The `metrics_calculator` module handles calculation and management of trading metrics including Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD). It provides windowed aggregation with throttled updates for real-time visualization.
+
+## Public Interface
+
+### Classes
+- `MetricsCalculator(window_seconds: int = 60, emit_every_n_updates: int = 25)`: Main metrics calculation engine
+
+### Methods
+- `update_cvd_from_trade(side: str, size: float) -> None`: Update CVD from individual trade data
+- `update_obi_metrics(timestamp: str, total_bids: float, total_asks: float) -> None`: Update OBI metrics from orderbook volumes
+- `finalize_metrics() -> None`: Emit final metrics bar at processing end
+
+### Properties
+- `cvd_cumulative: float`: Current cumulative volume delta value
+
+### Private Methods  
+- `_emit_metrics_bar() -> None`: Emit current metrics to visualization layer
+
+## Usage Examples
+
+```python
+from metrics_calculator import MetricsCalculator
+
+# Initialize calculator
+calc = MetricsCalculator(window_seconds=60, emit_every_n_updates=25)
+
+# Update CVD from trades
+calc.update_cvd_from_trade("buy", 1.5)   # +1.5 CVD
+calc.update_cvd_from_trade("sell", 1.0)  # -1.0 CVD, net +0.5
+
+# Update OBI from orderbook
+total_bids, total_asks = 150.0, 120.0
+calc.update_obi_metrics("1640995200000", total_bids, total_asks)
+
+# Access current CVD
+current_cvd = calc.cvd_cumulative  # 0.5
+
+# Finalize at end of processing
+calc.finalize_metrics()
+```
+
+## Metrics Definitions
+
+### Cumulative Volume Delta (CVD)
+- **Formula**: CVD = Σ(buy_volume - sell_volume)
+- **Interpretation**: Positive = more buying pressure, Negative = more selling pressure
+- **Accumulation**: Running total across all processed trades
+- **Update Frequency**: Every trade
+
+### Order Book Imbalance (OBI)
+- **Formula**: OBI = total_bid_volume - total_ask_volume
+- **Interpretation**: Positive = more bid liquidity, Negative = more ask liquidity
+- **Aggregation**: OHLC-style bars per time window (open, high, low, close)
+- **Update Frequency**: Throttled per orderbook update
+
+## Dependencies
+
+### Internal
+- `viz_io.upsert_metric_bar`: Output interface for visualization
+
+### External
+- `logging`: Warning messages for unknown trade sides
+- `typing`: Type annotations
+
+## Windowed Aggregation
+
+### OBI Windows
+- **Window Size**: Configurable via `window_seconds` (default: 60)
+- **Window Alignment**: Aligned to epoch time boundaries
+- **OHLC Tracking**: Maintains open, high, low, close values per window
+- **Rollover**: Automatic window transitions with final bar emission
+
+### Throttling Mechanism
+- **Purpose**: Reduce I/O overhead during high-frequency updates
+- **Trigger**: Every N updates (configurable via `emit_every_n_updates`)
+- **Behavior**: Emits intermediate updates for real-time visualization
+- **Final Emission**: Guaranteed on window rollover and finalization
+
+## State Management
+
+### CVD State
+- `cvd_cumulative: float`: Running total across all trades
+- **Persistence**: Maintained throughout processor lifetime
+- **Updates**: Incremental addition/subtraction per trade
+
+### OBI State
+- `metrics_window_start: int`: Current window start timestamp
+- `metrics_bar: dict`: Current OBI OHLC values
+- `_metrics_since_last_emit: int`: Throttling counter
+
+## Output Format
+
+### Metrics Bar Structure
+```python
+{
+    'obi_open': float,    # First OBI value in window
+    'obi_high': float,    # Maximum OBI in window  
+    'obi_low': float,     # Minimum OBI in window
+    'obi_close': float,   # Latest OBI value
+}
+```
+
+### Visualization Integration
+- Emitted via `viz_io.upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close, cvd_value)`
+- Compatible with existing OHLC visualization infrastructure
+- Real-time updates during active processing
+
+## Performance Characteristics
+
+- **Low Memory**: Maintains only current window state
+- **Throttled I/O**: Configurable update frequency prevents excessive writes
+- **Efficient Updates**: O(1) operations for trade and OBI updates
+- **Window Management**: Automatic transitions without manual intervention
+
+## Configuration
+
+### Constructor Parameters
+- `window_seconds: int`: Time window for OBI aggregation (default: 60)
+- `emit_every_n_updates: int`: Throttling factor for intermediate updates (default: 25)
+
+### Tuning Guidelines
+- **Higher throttling**: Reduces I/O load, delays real-time updates
+- **Lower throttling**: More responsive visualization, higher I/O overhead
+- **Window size**: Affects granularity of OBI trends (shorter = more detail)
+
+## Testing
+
+```bash
+uv run pytest test_metrics_calculator.py -v
+```
+
+Test coverage includes:
+- CVD accumulation accuracy across multiple trades
+- OBI window rollover and OHLC tracking
+- Throttling behavior verification
+- Edge cases (unknown trade sides, empty windows)
+- Integration with visualization output
+
+## Known Limitations
+
+- CVD calculation assumes binary buy/sell classification
+- No support for partial fills or complex order types
+- OBI calculation treats all liquidity equally (no price weighting)
+- Window boundaries aligned to absolute timestamps (no sliding windows)
--- a/docs/modules/ohlc_processor.md
+++ b/docs/modules/ohlc_processor.md
@@ -0,0 +1,122 @@
+# Module: ohlc_processor
+
+## Purpose
+The `ohlc_processor` module serves as the main coordinator for trade data processing, orchestrating OHLC aggregation, orderbook management, and metrics calculation. It has been refactored into a modular architecture using composition with specialized helper modules.
+
+## Public Interface
+
+### Classes
+- `OHLCProcessor(window_seconds: int = 60, depth_levels_per_side: int = 50)`: Main orchestrator class that coordinates trade processing using composition
+
+### Methods
+- `process_trades(trades: list[tuple]) -> None`: Aggregate trades into OHLC bars and update CVD metrics
+- `update_orderbook(ob_update: OrderbookUpdate) -> None`: Apply orderbook updates and calculate OBI metrics  
+- `finalize() -> None`: Emit final OHLC bar and metrics data
+- `cvd_cumulative` (property): Access to cumulative volume delta value
+
+### Composed Modules
+- `OrderbookManager`: Handles in-memory orderbook state and depth snapshots
+- `MetricsCalculator`: Manages OBI and CVD metric calculations  
+- `level_parser` functions: Parse and normalize orderbook level data
+
+## Usage Examples
+
+```python
+from ohlc_processor import OHLCProcessor
+from db_interpreter import DBInterpreter
+
+# Initialize processor with 1-minute windows and 50 depth levels
+processor = OHLCProcessor(window_seconds=60, depth_levels_per_side=50)
+
+# Process streaming data
+for ob_update, trades in DBInterpreter(db_path).stream():
+    # Aggregate trades into OHLC bars
+    processor.process_trades(trades)
+    
+    # Update orderbook and emit depth snapshots
+    processor.update_orderbook(ob_update)
+
+# Finalize processing
+processor.finalize()
+```
+
+### Advanced Configuration
+```python
+# Custom window size and depth levels
+processor = OHLCProcessor(
+    window_seconds=30,        # 30-second bars
+    depth_levels_per_side=25  # Top 25 levels per side
+)
+```
+
+## Dependencies
+
+### Internal Modules
+- `orderbook_manager.OrderbookManager`: In-memory orderbook state management
+- `metrics_calculator.MetricsCalculator`: OBI and CVD metrics calculation
+- `level_parser`: Orderbook level parsing utilities
+- `viz_io`: JSON output for visualization
+- `db_interpreter.OrderbookUpdate`: Input data structures
+
+### External
+- `typing`: Type annotations
+- `logging`: Debug and operational logging
+
+## Modular Architecture
+
+The processor now follows a clean composition pattern:
+
+1. **Main Coordinator** (`OHLCProcessor`):
+   - Orchestrates trade and orderbook processing
+   - Maintains OHLC bar state and window management
+   - Delegates specialized tasks to composed modules
+
+2. **Orderbook Management** (`OrderbookManager`):
+   - Maintains in-memory price→size mappings
+   - Applies partial updates and handles deletions
+   - Provides sorted top-N level extraction
+
+3. **Metrics Calculation** (`MetricsCalculator`):
+   - Tracks CVD from trade flow (buy/sell volume delta)
+   - Calculates OBI from orderbook volume imbalance
+   - Manages windowed metrics aggregation with throttling
+
+4. **Level Parsing** (`level_parser` module):
+   - Normalizes JSON and Python literal level representations
+   - Handles zero-size levels for orderbook deletions
+   - Provides robust error handling for malformed data
+
+## Performance Characteristics
+
+- **Throttled Updates**: Prevents excessive I/O during high-frequency periods
+- **Memory Efficient**: Maintains only current window and top-N depth levels
+- **Incremental Processing**: Applies only changed orderbook levels
+- **Atomic Operations**: Thread-safe updates to shared data structures
+
+## Testing
+
+Run module tests:
+```bash
+uv run pytest test_ohlc_processor.py -v
+```
+
+Test coverage includes:
+- OHLC calculation accuracy across window boundaries
+- Volume accumulation correctness
+- High/low price tracking
+- Orderbook update application
+- Depth snapshot generation
+- OBI metric calculation
+
+## Known Issues
+
+- Orderbook level parsing assumes well-formed JSON or Python literals
+- Memory usage scales with number of active price levels
+- Clock skew between trades and orderbook updates not handled
+
+## Configuration Options
+
+- `window_seconds`: Time window size for OHLC aggregation (default: 60)
+- `depth_levels_per_side`: Number of top price levels to maintain (default: 50)
+- `UPSERT_THROTTLE_MS`: Minimum interval between upsert operations (internal)
+- `DEPTH_EMIT_THROTTLE_MS`: Minimum interval between depth emissions (internal)
--- a/docs/modules/orderbook_manager.md
+++ b/docs/modules/orderbook_manager.md
@@ -0,0 +1,121 @@
+# Module: orderbook_manager
+
+## Purpose
+The `orderbook_manager` module provides in-memory orderbook state management with partial update capabilities. It maintains separate bid and ask sides and supports efficient top-level extraction for visualization.
+
+## Public Interface
+
+### Classes
+- `OrderbookManager(depth_levels_per_side: int = 50)`: Main orderbook state manager
+
+### Methods
+- `apply_updates(bids_updates: List[Tuple[float, float]], asks_updates: List[Tuple[float, float]]) -> None`: Apply partial updates to both sides
+- `get_total_volume() -> Tuple[float, float]`: Get total bid and ask volumes
+- `get_top_levels() -> Tuple[List[List[float]], List[List[float]]]`: Get sorted top levels for both sides
+
+### Private Methods
+- `_apply_partial_updates(side_map: Dict[float, float], updates: List[Tuple[float, float]]) -> None`: Apply updates to one side
+- `_build_top_levels(side_map: Dict[float, float], limit: int, reverse: bool) -> List[List[float]]`: Extract sorted top levels
+
+## Usage Examples
+
+```python
+from orderbook_manager import OrderbookManager
+
+# Initialize manager
+manager = OrderbookManager(depth_levels_per_side=25)
+
+# Apply orderbook updates
+bids = [(50000.0, 1.5), (49999.0, 2.0)]
+asks = [(50001.0, 1.2), (50002.0, 0.8)]
+manager.apply_updates(bids, asks)
+
+# Get volume totals for OBI calculation
+total_bids, total_asks = manager.get_total_volume()
+obi = total_bids - total_asks
+
+# Get top levels for depth visualization
+bids_sorted, asks_sorted = manager.get_top_levels()
+
+# Handle deletions (size = 0)
+deletions = [(50000.0, 0.0)]  # Remove price level
+manager.apply_updates(deletions, [])
+```
+
+## Dependencies
+
+### External
+- `typing`: Type annotations for Dict, List, Tuple
+
+## State Management
+
+### Internal State
+- `_book_bids: Dict[float, float]`: Price → size mapping for bid side
+- `_book_asks: Dict[float, float]`: Price → size mapping for ask side
+- `depth_levels_per_side: int`: Configuration for top-N extraction
+
+### Update Semantics
+- **Size = 0**: Remove price level (deletion)
+- **Size > 0**: Upsert price level with new size
+- **Size < 0**: Ignored (invalid update)
+
+### Sorting Behavior
+- **Bids**: Descending by price (highest price first)
+- **Asks**: Ascending by price (lowest price first)
+- **Top-N**: Limited by `depth_levels_per_side` parameter
+
+## Performance Characteristics
+
+- **Memory Efficient**: Only stores non-zero price levels
+- **Fast Updates**: O(1) upsert/delete operations using dict
+- **Efficient Sorting**: Only sorts when extracting top levels
+- **Bounded Output**: Limits result size for visualization performance
+
+## Use Cases
+
+### OBI Calculation
+```python
+total_bids, total_asks = manager.get_total_volume()
+order_book_imbalance = total_bids - total_asks
+```
+
+### Depth Visualization
+```python
+bids, asks = manager.get_top_levels()
+depth_payload = {"bids": bids, "asks": asks}
+```
+
+### Incremental Updates
+```python
+# Typical orderbook update cycle
+updates = parse_orderbook_changes(raw_data)
+manager.apply_updates(updates['bids'], updates['asks'])
+```
+
+## Testing
+
+```bash
+uv run pytest test_orderbook_manager.py -v
+```
+
+Test coverage includes:
+- Partial update application correctness
+- Deletion handling (size = 0)
+- Volume calculation accuracy
+- Top-level sorting and limiting
+- Edge cases (empty books, single levels)
+- Performance with large orderbooks
+
+## Configuration
+
+- `depth_levels_per_side`: Controls output size for visualization (default: 50)
+- Affects memory usage and sorting performance
+- Higher values provide more market depth detail
+- Lower values improve processing speed
+
+## Known Limitations
+
+- No built-in validation of price/size values
+- Memory usage scales with number of unique price levels
+- No historical state tracking (current snapshot only)
+- No support for spread calculation or market data statistics
--- a/docs/modules/viz_io.md
+++ b/docs/modules/viz_io.md
@@ -0,0 +1,155 @@
+# Module: viz_io
+
+## Purpose
+The `viz_io` module provides atomic inter-process communication (IPC) between the data processing pipeline and the visualization frontend. It manages JSON file-based data exchange with atomic writes to prevent race conditions and data corruption.
+
+## Public Interface
+
+### Functions
+- `add_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Append new OHLC bar to rolling dataset
+- `upsert_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Update existing bar or append new one
+- `clear_data()`: Reset OHLC dataset to empty state
+- `add_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Append OBI metric bar
+- `upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Update existing OBI bar or append new one
+- `clear_metrics()`: Reset metrics dataset to empty state
+- `set_depth_data(bids, asks)`: Update current orderbook depth snapshot
+
+### Constants
+- `DATA_FILE`: Path to OHLC data JSON file
+- `DEPTH_FILE`: Path to depth data JSON file  
+- `METRICS_FILE`: Path to metrics data JSON file
+- `MAX_BARS`: Maximum number of bars to retain (1000)
+
+## Usage Examples
+
+### Basic OHLC Operations
+```python
+import viz_io
+
+# Add a new OHLC bar
+viz_io.add_ohlc_bar(
+    timestamp=1640995200000,  # Unix timestamp in milliseconds
+    open_price=50000.0,
+    high_price=50100.0,
+    low_price=49900.0,
+    close_price=50050.0,
+    volume=125.5
+)
+
+# Update the current bar (if timestamp matches) or add new one
+viz_io.upsert_ohlc_bar(
+    timestamp=1640995200000,
+    open_price=50000.0,
+    high_price=50150.0,  # Updated high
+    low_price=49850.0,   # Updated low
+    close_price=50075.0,  # Updated close
+    volume=130.2         # Updated volume
+)
+```
+
+### Orderbook Depth Management
+```python
+# Set current depth snapshot
+bids = [[49990.0, 1.5], [49985.0, 2.1], [49980.0, 0.8]]
+asks = [[50010.0, 1.2], [50015.0, 1.8], [50020.0, 2.5]]
+
+viz_io.set_depth_data(bids, asks)
+```
+
+### Metrics Operations
+```python
+# Add Order Book Imbalance metrics
+viz_io.add_metric_bar(
+    timestamp=1640995200000,
+    obi_open=0.15,
+    obi_high=0.22,
+    obi_low=0.08,
+    obi_close=0.18
+)
+```
+
+## Dependencies
+
+### Internal
+- None (standalone utility module)
+
+### External
+- `json`: JSON serialization/deserialization
+- `pathlib`: File path handling
+- `typing`: Type annotations
+- `tempfile`: Atomic write operations
+
+## Data Formats
+
+### OHLC Data (`ohlc_data.json`)
+```json
+[
+  [1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
+  [1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
+]
+```
+Format: `[timestamp, open, high, low, close, volume]`
+
+### Depth Data (`depth_data.json`)
+```json
+{
+  "bids": [[49990.0, 1.5], [49985.0, 2.1]],
+  "asks": [[50010.0, 1.2], [50015.0, 1.8]]
+}
+```
+Format: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`
+
+### Metrics Data (`metrics_data.json`)
+```json
+[
+  [1640995200000, 0.15, 0.22, 0.08, 0.18],
+  [1640995260000, 0.18, 0.25, 0.12, 0.20]
+]
+```
+Format: `[timestamp, obi_open, obi_high, obi_low, obi_close]`
+
+## Atomic Write Operations
+
+All write operations use atomic file replacement to prevent partial reads:
+
+1. Write data to temporary file
+2. Flush and sync to disk
+3. Atomically rename temporary file to target file
+
+This ensures the visualization frontend always reads complete, valid JSON data.
+
+## Performance Characteristics
+
+- **Bounded Memory**: OHLC and metrics datasets limited to 1000 bars max
+- **Atomic Operations**: No partial reads possible during writes
+- **Rolling Window**: Automatic trimming of old data maintains constant memory usage
+- **Fast Lookups**: Timestamp-based upsert operations use list scanning (acceptable for 1000 items)
+
+## Testing
+
+Run module tests:
+```bash
+uv run pytest test_viz_io.py -v
+```
+
+Test coverage includes:
+- Atomic write operations
+- Data format validation
+- Rolling window behavior
+- Upsert logic correctness
+- File corruption prevention
+- Concurrent read/write scenarios
+
+## Known Issues
+
+- File I/O may block briefly during atomic writes
+- JSON parsing errors not propagated to callers
+- Limited to 1000 bars maximum (configurable via MAX_BARS)
+- No compression for large datasets
+
+## Thread Safety
+
+All operations are thread-safe for single writer, multiple reader scenarios:
+- Writer: Data processing pipeline (single thread)
+- Readers: Visualization frontend (polling)
+- Atomic file operations prevent corruption during concurrent access