4.7 KiB
4.7 KiB
PRD: VectorBT Migration & CCXT Integration
1. Introduction
The goal of this project is to refactor the current backtesting infrastructure to a professional-grade stack using VectorBT for high-performance backtesting and CCXT for robust historical data acquisition. The system will support rapid prototyping of "many simple strategies," parameter optimization (Grid Search), and stability testing (Walk-Forward Analysis).
2. Goals
- Replace Custom Backtester: Retire the existing loop-based backtesting logic in favor of vectorized operations using
vectorbt. - Automate Data Collection: Implement a
ccxtbased downloader to fetch and cache OHLCV data from OKX (and other exchanges) automatically. - Enable Optimization: Built-in support for Grid Search to find optimal strategy parameters.
- Validation: Implement Walk-Forward Analysis (WFA) to validate strategy robustness and prevent overfitting.
- Standardized Reporting: Generate consistent outputs: Console summaries, CSV logs, and VectorBT interactive plots.
3. User Stories
- Data Acquisition: "As a user, I want to run a command
download_data --pair BTC/USDT --exchange okxand have the system fetch historical 1-minute candles and save them todata/ccxt/okx/BTC-USDT/1m.csv." - Strategy Dev: "As a researcher, I want to define a new strategy by simply writing a class/function that defines entry/exit signals, without worrying about the backtesting loop."
- Optimization: "As a researcher, I want to say 'Optimize RSI period between 10 and 20' and get a heatmap of results."
- Validation: "As a researcher, I want to verify if my 'best' parameters work on unseen data using Walk-Forward Analysis."
- Analysis: "As a user, I want to see an equity curve and key metrics (Sharpe, Drawdown) immediately after a test run."
4. Functional Requirements
4.1 Data Module (data_manager)
- Exchange Interface: Use
ccxtto connect to exchanges (initially OKX). - Fetching Logic: Fetch OHLCV data in chunks to handle rate limits and long histories.
- Storage: Save data to standardized paths:
data/ccxt/{exchange}/{pair}_{timeframe}.csv. - Loading: Utility to load saved CSVs into a Pandas DataFrame compatible with
vectorbt.
4.2 Strategy Interface (strategies/)
- Base Protocol: Define a standard structure for strategies. A strategy should return/define:
- Indicator calculations (Vectorized).
- Entry signals (Boolean Series).
- Exit signals (Boolean Series).
- Parameterization: Strategies must accept dynamic parameters to support Grid Search.
4.3 Backtest Engine (engine.py)
- Simulation: Use
vectorbt.Portfolio.from_signals(or similar) for fast simulation. - Cost Model: Support configurable fees (maker/taker) and slippage estimates.
- Grid Search: Utilize
vectorbt's parameter broadcasting to run many variations simultaneously. - Walk-Forward Analysis:
- Implement a splitting mechanism (e.g.,
vectorbt.Splitter) to divide data into In-Sample (Train) and Out-of-Sample (Test) sets. - Execute optimization on Train, validate on Test.
- Implement a splitting mechanism (e.g.,
4.4 Reporting (reporting.py)
- Console: Print key metrics: Total Return, Sharpe Ratio, Max Drawdown, Win Rate, Count of Trades.
- Files: Save detailed trade logs and metrics summaries to
backtest_logs/. - Visuals: Generate and save/show
vectorbtplots (Equity curve, Drawdowns).
5. Non-Goals
- Real-time live trading execution (this is strictly for research/backtesting).
- Complex Machine Learning models (initially focusing on indicator-based logic).
- High-frequency tick-level backtesting (1-minute granularity is the target).
6. Technical Architecture Proposal
project_root/
├── data/
│ └── ccxt/ # New data storage structure
├── strategies/ # Strategy definitions
│ ├── __init__.py
│ ├── base.py # Abstract Base Class
│ └── ma_cross.py # Example strategy
├── engine/
│ ├── data_loader.py # CCXT wrapper
│ ├── backtester.py # VBT runner
│ └── optimizer.py # Grid Search & WFA logic
├── main.py # CLI entry point
└── pyproject.toml
7. Success Metrics
- Can download 1 year of 1m BTC/USDT data from OKX in < 2 minutes.
- Can run a 100-parameter grid search on 1 year of 1m data in < 10 seconds.
- Walk-forward analysis produces a clear "Robustness Score" or visual comparison of Train vs Test performance.
8. Open Questions
- Do we need to handle funding rates for perp futures in the PnL calculation immediately? (Assumed NO for V1, stick to spot/simple futures price action).