Files
lowkey_backtest/tasks/prd-vectorbt-migration.md

77 lines
4.7 KiB
Markdown

# PRD: VectorBT Migration & CCXT Integration
## 1. Introduction
The goal of this project is to refactor the current backtesting infrastructure to a professional-grade stack using **VectorBT** for high-performance backtesting and **CCXT** for robust historical data acquisition. The system will support rapid prototyping of "many simple strategies," parameter optimization (Grid Search), and stability testing (Walk-Forward Analysis).
## 2. Goals
- **Replace Custom Backtester:** Retire the existing loop-based backtesting logic in favor of vectorized operations using `vectorbt`.
- **Automate Data Collection:** Implement a `ccxt` based downloader to fetch and cache OHLCV data from OKX (and other exchanges) automatically.
- **Enable Optimization:** Built-in support for Grid Search to find optimal strategy parameters.
- **Validation:** Implement Walk-Forward Analysis (WFA) to validate strategy robustness and prevent overfitting.
- **Standardized Reporting:** Generate consistent outputs: Console summaries, CSV logs, and VectorBT interactive plots.
## 3. User Stories
- **Data Acquisition:** "As a user, I want to run a command `download_data --pair BTC/USDT --exchange okx` and have the system fetch historical 1-minute candles and save them to `data/ccxt/okx/BTC-USDT/1m.csv`."
- **Strategy Dev:** "As a researcher, I want to define a new strategy by simply writing a class/function that defines entry/exit signals, without worrying about the backtesting loop."
- **Optimization:** "As a researcher, I want to say 'Optimize RSI period between 10 and 20' and get a heatmap of results."
- **Validation:** "As a researcher, I want to verify if my 'best' parameters work on unseen data using Walk-Forward Analysis."
- **Analysis:** "As a user, I want to see an equity curve and key metrics (Sharpe, Drawdown) immediately after a test run."
## 4. Functional Requirements
### 4.1 Data Module (`data_manager`)
- **Exchange Interface:** Use `ccxt` to connect to exchanges (initially OKX).
- **Fetching Logic:** Fetch OHLCV data in chunks to handle rate limits and long histories.
- **Storage:** Save data to standardized paths: `data/ccxt/{exchange}/{pair}_{timeframe}.csv`.
- **Loading:** Utility to load saved CSVs into a Pandas DataFrame compatible with `vectorbt`.
### 4.2 Strategy Interface (`strategies/`)
- **Base Protocol:** Define a standard structure for strategies. A strategy should return/define:
- Indicator calculations (Vectorized).
- Entry signals (Boolean Series).
- Exit signals (Boolean Series).
- **Parameterization:** Strategies must accept dynamic parameters to support Grid Search.
### 4.3 Backtest Engine (`engine.py`)
- **Simulation:** Use `vectorbt.Portfolio.from_signals` (or similar) for fast simulation.
- **Cost Model:** Support configurable fees (maker/taker) and slippage estimates.
- **Grid Search:** Utilize `vectorbt`'s parameter broadcasting to run many variations simultaneously.
- **Walk-Forward Analysis:**
- Implement a splitting mechanism (e.g., `vectorbt.Splitter`) to divide data into In-Sample (Train) and Out-of-Sample (Test) sets.
- Execute optimization on Train, validate on Test.
### 4.4 Reporting (`reporting.py`)
- **Console:** Print key metrics: Total Return, Sharpe Ratio, Max Drawdown, Win Rate, Count of Trades.
- **Files:** Save detailed trade logs and metrics summaries to `backtest_logs/`.
- **Visuals:** Generate and save/show `vectorbt` plots (Equity curve, Drawdowns).
## 5. Non-Goals
- Real-time live trading execution (this is strictly for research/backtesting).
- Complex Machine Learning models (initially focusing on indicator-based logic).
- High-frequency tick-level backtesting (1-minute granularity is the target).
## 6. Technical Architecture Proposal
```text
project_root/
├── data/
│ └── ccxt/ # New data storage structure
├── strategies/ # Strategy definitions
│ ├── __init__.py
│ ├── base.py # Abstract Base Class
│ └── ma_cross.py # Example strategy
├── engine/
│ ├── data_loader.py # CCXT wrapper
│ ├── backtester.py # VBT runner
│ └── optimizer.py # Grid Search & WFA logic
├── main.py # CLI entry point
└── pyproject.toml
```
## 7. Success Metrics
- Can download 1 year of 1m BTC/USDT data from OKX in < 2 minutes.
- Can run a 100-parameter grid search on 1 year of 1m data in < 10 seconds.
- Walk-forward analysis produces a clear "Robustness Score" or visual comparison of Train vs Test performance.
## 8. Open Questions
- Do we need to handle funding rates for perp futures in the PnL calculation immediately? (Assumed NO for V1, stick to spot/simple futures price action).