# System Architecture

## Overview

The Orderflow Backtest System is designed as a modular, high-performance data processing pipeline for cryptocurrency trading analysis. The architecture emphasizes separation of concerns, efficient memory usage, and scalable processing of large datasets.

## High-Level Architecture

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Data Sources  │    │   Processing     │    │   Presentation  │
│                 │    │                  │    │                 │
│ ┌─────────────┐ │    │ ┌──────────────┐ │    │ ┌─────────────┐ │
│ │SQLite Files │─┼────┼→│   Storage    │─┼────┼→│ Visualizer  │ │
│ │- orderbook  │ │    │ │- Orchestrator│ │    │ │- OHLC Charts│ │
│ │- trades     │ │    │ │- Calculator  │ │    │ │- OBI/CVD    │ │
│ └─────────────┘ │    │ └──────────────┘ │    │ └─────────────┘ │
│                 │    │        │         │    │        ▲        │
└─────────────────┘    │ ┌─────────────┐  │    │ ┌─────────────┐ │
                       │ │  Strategy   │──┼────┼→│   Reports   │ │
                       │ │- Analysis   │  │    │ │- Metrics    │ │
                       │ │- Alerts     │  │    │ │- Summaries  │ │
                       │ └─────────────┘  │    │ └─────────────┘ │
                       └──────────────────┘    └─────────────────┘
```

## Component Architecture

### Data Layer

#### Models (`models.py`)
**Purpose**: Core data structures and calculation logic

```python
# Core data models
OrderbookLevel   # Single price level (price, size, order_count, liquidation_count)
Trade           # Individual trade execution (price, size, side, timestamp)
BookSnapshot    # Complete orderbook state at timestamp
Book           # Container for snapshot sequence
Metric         # Calculated OBI/CVD values

# Calculation engine
MetricCalculator # Static methods for OBI/CVD computation
```

**Relationships**:
- `Book` contains multiple `BookSnapshot` instances
- `BookSnapshot` contains dictionaries of `OrderbookLevel` and lists of `Trade`
- `Metric` stores calculated values for each `BookSnapshot`
- `MetricCalculator` operates on snapshots to produce metrics

#### Repositories (`repositories/`)
**Purpose**: Database access and persistence layer

```python
# Repository
SQLiteOrderflowRepository:
  - connect()                    # Optimized SQLite connection
  - load_trades_by_timestamp()   # Efficient trade loading
  - iterate_book_rows()          # Memory-efficient snapshot streaming
  - count_rows()                 # Performance monitoring
  - create_metrics_table()       # Schema creation
  - insert_metrics_batch()       # High-performance batch inserts
  - load_metrics_by_timerange()  # Time-range queries
  - table_exists()               # Schema validation
```

**Design Patterns**:
- **Repository Pattern**: Clean separation between data access and business logic
- **Batch Processing**: Process 1000 records per database operation
- **Connection Management**: Caller manages connection lifecycle
- **Performance Optimization**: SQLite PRAGMAs for high-speed operations

### Processing Layer

#### Storage (`storage.py`)
**Purpose**: Orchestrates data loading, processing, and metrics calculation

```python
class Storage:
  - build_booktick_from_db()           # Main processing pipeline
  - _create_snapshots_and_metrics()    # Per-snapshot processing
  - _snapshot_from_row()               # Individual snapshot creation
```

**Processing Pipeline**:
1. **Initialize**: Create metrics repository and table if needed
2. **Load Trades**: Group trades by timestamp for efficient access
3. **Stream Processing**: Process snapshots one-by-one to minimize memory
4. **Calculate Metrics**: OBI and CVD calculation per snapshot
5. **Batch Persistence**: Store metrics in batches of 1000
6. **Memory Management**: Discard full snapshots after metric extraction

#### Strategy Framework (`strategies.py`)
**Purpose**: Trading analysis and signal generation

```python
class DefaultStrategy:
  - set_db_path()              # Configure database access
  - compute_OBI()              # Real-time OBI calculation (fallback)
  - load_stored_metrics()      # Retrieve persisted metrics
  - get_metrics_summary()      # Statistical analysis
  - on_booktick()             # Main analysis entry point
```

**Analysis Capabilities**:
- **Stored Metrics**: Primary analysis using persisted data
- **Real-time Fallback**: Live calculation for compatibility
- **Statistical Summaries**: Min/max/average OBI, CVD changes
- **Alert System**: Configurable thresholds for significant imbalances

### Presentation Layer

#### Visualization (`visualizer.py`)
**Purpose**: Multi-chart rendering and display

```python
class Visualizer:
  - set_db_path()              # Configure metrics access
  - update_from_book()         # Main rendering pipeline
  - _load_stored_metrics()     # Retrieve metrics for chart range
  - _draw()                    # Multi-subplot rendering
  - show()                     # Display interactive charts
```

**Chart Layout**:
```
┌─────────────────────────────────────┐
│            OHLC Candlesticks        │  ← Price action
├─────────────────────────────────────┤
│              Volume Bars            │  ← Trading volume
├─────────────────────────────────────┤
│          OBI Line Chart             │  ← Order book imbalance
├─────────────────────────────────────┤
│          CVD Line Chart             │  ← Cumulative volume delta
└─────────────────────────────────────┘
```

**Features**:
- **Shared Time Axis**: Synchronized X-axis across all subplots
- **Auto-scaling**: Y-axis optimization for each metric type
- **Performance**: Efficient rendering of large datasets
- **Interactive**: Qt5Agg backend for zooming and panning

## Data Flow

### Processing Flow
```
1. SQLite DB → Repository → Raw Data
2. Raw Data → Storage → BookSnapshot
3. BookSnapshot → MetricCalculator → OBI/CVD
4. Metrics → Repository → Database Storage
5. Stored Metrics → Strategy → Analysis
6. Stored Metrics → Visualizer → Charts
```

### Memory Management Flow
```
Traditional: DB → All Snapshots in Memory → Analysis (High Memory)
Optimized:   DB → Process Snapshot → Calculate Metrics → Store → Discard (Low Memory)
```

## Database Schema

### Input Schema (Required)
```sql
-- Orderbook snapshots
CREATE TABLE book (
    id INTEGER PRIMARY KEY,
    instrument TEXT,
    bids TEXT,              -- JSON: [[price, size, liq_count, order_count], ...]
    asks TEXT,              -- JSON: [[price, size, liq_count, order_count], ...]
    timestamp TEXT
);

-- Trade executions  
CREATE TABLE trades (
    id INTEGER PRIMARY KEY,
    instrument TEXT,
    trade_id TEXT,
    price REAL,
    size REAL,
    side TEXT,              -- "buy" or "sell"
    timestamp TEXT
);
```

### Output Schema (Auto-created)
```sql
-- Calculated metrics
CREATE TABLE metrics (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    snapshot_id INTEGER,
    timestamp TEXT,
    obi REAL,               -- Order Book Imbalance [-1, 1]
    cvd REAL,               -- Cumulative Volume Delta
    best_bid REAL,
    best_ask REAL,
    FOREIGN KEY (snapshot_id) REFERENCES book(id)
);

-- Performance indexes
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
CREATE INDEX idx_metrics_snapshot_id ON metrics(snapshot_id);
```

## Performance Characteristics

### Memory Optimization
- **Before**: Store all snapshots in memory (~1GB for 600K snapshots)
- **After**: Store only metrics data (~300MB for same dataset)
- **Reduction**: >70% memory usage decrease

### Processing Performance
- **Batch Size**: 1000 records per database operation
- **Processing Speed**: ~1000 snapshots/second on modern hardware
- **Database Overhead**: <20% storage increase for metrics table
- **Query Performance**: Sub-second retrieval for typical time ranges

### Scalability Limits
- **Single File**: 1M+ snapshots per database file
- **Time Range**: Months to years of historical data
- **Memory Peak**: <2GB for year-long datasets
- **Disk Space**: Original size + 20% for metrics

## Integration Points

### External Interfaces
```python
# Main application entry point
main.py:
  - CLI argument parsing
  - Database file discovery
  - Component orchestration
  - Progress monitoring

# Plugin interfaces
Strategy.on_booktick(book: Book)     # Strategy integration point
Visualizer.update_from_book(book)    # Visualization integration
```

### Internal Interfaces
```python
# Repository interfaces
Repository.connect() → Connection
Repository.load_data() → TypedData
Repository.store_data(data) → None

# Calculator interfaces
MetricCalculator.calculate_obi(snapshot) → float
MetricCalculator.calculate_cvd(prev_cvd, trades) → float
```

## Security Considerations

### Data Protection
- **SQL Injection**: All queries use parameterized statements
- **File Access**: Validates database file paths and permissions
- **Error Handling**: No sensitive data in error messages
- **Input Validation**: Sanitizes all external inputs

### Access Control
- **Database**: Respects file system permissions
- **Memory**: No sensitive data persistence beyond processing
- **Logging**: Configurable log levels without data exposure

## Configuration Management

### Performance Tuning
```python
# Storage configuration
BATCH_SIZE = 1000           # Records per database operation
LOG_FREQUENCY = 20          # Progress reports per processing run

# SQLite optimization
PRAGMA journal_mode = OFF   # Maximum write performance
PRAGMA synchronous = OFF    # Disable synchronous writes
PRAGMA cache_size = 100000  # Large memory cache
```

### Visualization Settings
```python
# Chart configuration
WINDOW_SECONDS = 60         # OHLC aggregation window
MAX_BARS = 500             # Maximum bars displayed
FIGURE_SIZE = (12, 10)     # Chart dimensions
```

## Error Handling Strategy

### Graceful Degradation
- **Database Errors**: Continue with reduced functionality
- **Calculation Errors**: Skip problematic snapshots with logging
- **Visualization Errors**: Display available data, note issues
- **Memory Pressure**: Adjust batch sizes automatically

### Recovery Mechanisms
- **Partial Processing**: Resume from last successful batch
- **Data Validation**: Verify metrics calculations before storage
- **Rollback Support**: Transaction boundaries for data consistency

---

This architecture provides a robust, scalable foundation for high-frequency trading data analysis while maintaining clean separation of concerns and efficient resource utilization.