orderflow_backtest/README.md

137 lines
4.5 KiB
Markdown
Raw Normal View History

# Orderflow Backtest System
2025-08-20 01:45:36 +00:00
A high-performance orderbook reconstruction and metrics analysis system for cryptocurrency trading data. Calculates Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD) metrics with per-snapshot granularity.
## Features
- **Orderbook Reconstruction**: Rebuild complete orderbooks from SQLite database files
- **OBI Metrics**: Calculate Order Book Imbalance `(Vb - Va) / (Vb + Va)` per snapshot
- **CVD Metrics**: Track Cumulative Volume Delta with incremental calculation and reset functionality
- **Memory Optimization**: >70% memory reduction through persistent metrics storage
- **Real-time Visualization**: OHLC candlesticks with OBI/CVD curves beneath volume graphs
- **Batch Processing**: High-performance processing of large datasets (months to years of data)
## Quick Start
### Prerequisites
- Python 3.12+
- UV package manager
- SQLite database files with orderbook and trades data
### Installation
```bash
# Install dependencies
uv sync
# Run tests to verify installation
uv run pytest
```
### Basic Usage
```bash
# Process BTC-USDT data from July 1-31, 2025
uv run python main.py BTC-USDT 2025-07-01 2025-08-01
```
## Architecture
### Core Components
- **`models.py`**: Data models (`OrderbookLevel`, `Trade`, `BookSnapshot`, `Book`, `Metric`, `MetricCalculator`)
- **`storage.py`**: Orchestrates orderbook reconstruction and metrics calculation
- **`strategies.py`**: Trading strategy framework with metrics analysis capabilities
- **`visualizer.py`**: Multi-subplot visualization (OHLC, Volume, OBI, CVD)
- **`main.py`**: CLI application entry point
### Data Layer
- **`repositories/sqlite_repository.py`**: Read-only SQLite data access
- **`repositories/sqlite_metrics_repository.py`**: Write-enabled metrics storage and retrieval
- **`parsers/orderbook_parser.py`**: Orderbook text parsing with price caching
### Testing
- **`tests/`**: Comprehensive unit and integration tests
- **Coverage**: 27 tests across 6 test files
- **Run tests**: `uv run pytest`
## Data Flow
1. **Data Loading**: SQLite databases → Repository → Raw orderbook/trades data
2. **Processing**: Storage → MetricCalculator → OBI/CVD calculation per snapshot
3. **Persistence**: Calculated metrics stored in database for future analysis
4. **Analysis**: Strategy loads stored metrics for trading signal generation
5. **Visualization**: Charts display OHLC, volume, OBI, and CVD with shared time axis
## Database Schema
### Input Tables (Required)
```sql
-- Orderbook snapshots
CREATE TABLE book (
id INTEGER PRIMARY KEY,
bids TEXT NOT NULL, -- JSON array of [price, size, liquidation_count, order_count]
asks TEXT NOT NULL, -- JSON array of [price, size, liquidation_count, order_count]
timestamp INTEGER NOT NULL -- Unix timestamp
);
-- Trade executions
CREATE TABLE trades (
id INTEGER PRIMARY KEY,
trade_id REAL NOT NULL,
price REAL NOT NULL,
size REAL NOT NULL,
side TEXT NOT NULL, -- "buy" or "sell"
timestamp INTEGER NOT NULL -- Unix timestamp
);
```
### Output Table (Auto-created)
```sql
-- Calculated metrics
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER NOT NULL,
timestamp INTEGER NOT NULL,
obi REAL NOT NULL, -- Order Book Imbalance [-1, 1]
cvd REAL NOT NULL, -- Cumulative Volume Delta
best_bid REAL, -- Best bid price
best_ask REAL, -- Best ask price
FOREIGN KEY (snapshot_id) REFERENCES book(id)
);
```
## Performance
- **Memory Usage**: >70% reduction vs. keeping full snapshot history
- **Processing Speed**: Batch processing with optimized SQLite queries
- **Scalability**: Handles months to years of high-frequency data
- **Storage Efficiency**: Metrics table <20% overhead vs. source data
## Development
### Setup
```bash
# Install development dependencies
uv add --dev pytest
# Run linting
uv run pytest --linting
# Run specific test modules
uv run pytest tests/test_storage_metrics.py -v
```
### Project Structure
```
orderflow_backtest/
├── docs/ # Documentation
├── models.py # Core data structures
├── storage.py # Data processing orchestrator
├── strategies.py # Trading strategy framework
├── visualizer.py # Chart rendering
├── main.py # CLI application
├── repositories/ # Data access layer
├── parsers/ # Data parsing utilities
└── tests/ # Test suite
```
For detailed documentation, see [./docs/README.md](./docs/README.md).