PRD: VectorBT Migration & CCXT Integration

1. Introduction

The goal of this project is to refactor the current backtesting infrastructure to a professional-grade stack using VectorBT for high-performance backtesting and CCXT for robust historical data acquisition. The system will support rapid prototyping of "many simple strategies," parameter optimization (Grid Search), and stability testing (Walk-Forward Analysis).

2. Goals

Replace Custom Backtester: Retire the existing loop-based backtesting logic in favor of vectorized operations using vectorbt.
Automate Data Collection: Implement a ccxt based downloader to fetch and cache OHLCV data from OKX (and other exchanges) automatically.
Enable Optimization: Built-in support for Grid Search to find optimal strategy parameters.
Validation: Implement Walk-Forward Analysis (WFA) to validate strategy robustness and prevent overfitting.
Standardized Reporting: Generate consistent outputs: Console summaries, CSV logs, and VectorBT interactive plots.

3. User Stories

Data Acquisition: "As a user, I want to run a command download_data --pair BTC/USDT --exchange okx and have the system fetch historical 1-minute candles and save them to data/ccxt/okx/BTC-USDT/1m.csv."
Strategy Dev: "As a researcher, I want to define a new strategy by simply writing a class/function that defines entry/exit signals, without worrying about the backtesting loop."
Optimization: "As a researcher, I want to say 'Optimize RSI period between 10 and 20' and get a heatmap of results."
Validation: "As a researcher, I want to verify if my 'best' parameters work on unseen data using Walk-Forward Analysis."
Analysis: "As a user, I want to see an equity curve and key metrics (Sharpe, Drawdown) immediately after a test run."

4. Functional Requirements

4.1 Data Module (`data_manager`)

Exchange Interface: Use ccxt to connect to exchanges (initially OKX).
Fetching Logic: Fetch OHLCV data in chunks to handle rate limits and long histories.
Storage: Save data to standardized paths: data/ccxt/{exchange}/{pair}_{timeframe}.csv.
Loading: Utility to load saved CSVs into a Pandas DataFrame compatible with vectorbt.

4.2 Strategy Interface (`strategies/`)

Base Protocol: Define a standard structure for strategies. A strategy should return/define:
- Indicator calculations (Vectorized).
- Entry signals (Boolean Series).
- Exit signals (Boolean Series).
Parameterization: Strategies must accept dynamic parameters to support Grid Search.

4.3 Backtest Engine (`engine.py`)

Simulation: Use vectorbt.Portfolio.from_signals (or similar) for fast simulation.
Cost Model: Support configurable fees (maker/taker) and slippage estimates.
Grid Search: Utilize vectorbt's parameter broadcasting to run many variations simultaneously.
Walk-Forward Analysis:
- Implement a splitting mechanism (e.g., vectorbt.Splitter) to divide data into In-Sample (Train) and Out-of-Sample (Test) sets.
- Execute optimization on Train, validate on Test.

4.4 Reporting (`reporting.py`)

Console: Print key metrics: Total Return, Sharpe Ratio, Max Drawdown, Win Rate, Count of Trades.
Files: Save detailed trade logs and metrics summaries to backtest_logs/.
Visuals: Generate and save/show vectorbt plots (Equity curve, Drawdowns).

5. Non-Goals

Real-time live trading execution (this is strictly for research/backtesting).
Complex Machine Learning models (initially focusing on indicator-based logic).
High-frequency tick-level backtesting (1-minute granularity is the target).

6. Technical Architecture Proposal

project_root/
├── data/
│   └── ccxt/               # New data storage structure
├── strategies/             # Strategy definitions
│   ├── __init__.py
│   ├── base.py             # Abstract Base Class
│   └── ma_cross.py         # Example strategy
├── engine/
│   ├── data_loader.py      # CCXT wrapper
│   ├── backtester.py       # VBT runner
│   └── optimizer.py        # Grid Search & WFA logic
├── main.py                 # CLI entry point
└── pyproject.toml

7. Success Metrics

Can download 1 year of 1m BTC/USDT data from OKX in < 2 minutes.
Can run a 100-parameter grid search on 1 year of 1m data in < 10 seconds.
Walk-forward analysis produces a clear "Robustness Score" or visual comparison of Train vs Test performance.

8. Open Questions

Do we need to handle funding rates for perp futures in the PnL calculation immediately? (Assumed NO for V1, stick to spot/simple futures price action).

4.7 KiB Raw Blame History