# PRD: VectorBT Migration & CCXT Integration ## 1. Introduction The goal of this project is to refactor the current backtesting infrastructure to a professional-grade stack using **VectorBT** for high-performance backtesting and **CCXT** for robust historical data acquisition. The system will support rapid prototyping of "many simple strategies," parameter optimization (Grid Search), and stability testing (Walk-Forward Analysis). ## 2. Goals - **Replace Custom Backtester:** Retire the existing loop-based backtesting logic in favor of vectorized operations using `vectorbt`. - **Automate Data Collection:** Implement a `ccxt` based downloader to fetch and cache OHLCV data from OKX (and other exchanges) automatically. - **Enable Optimization:** Built-in support for Grid Search to find optimal strategy parameters. - **Validation:** Implement Walk-Forward Analysis (WFA) to validate strategy robustness and prevent overfitting. - **Standardized Reporting:** Generate consistent outputs: Console summaries, CSV logs, and VectorBT interactive plots. ## 3. User Stories - **Data Acquisition:** "As a user, I want to run a command `download_data --pair BTC/USDT --exchange okx` and have the system fetch historical 1-minute candles and save them to `data/ccxt/okx/BTC-USDT/1m.csv`." - **Strategy Dev:** "As a researcher, I want to define a new strategy by simply writing a class/function that defines entry/exit signals, without worrying about the backtesting loop." - **Optimization:** "As a researcher, I want to say 'Optimize RSI period between 10 and 20' and get a heatmap of results." - **Validation:** "As a researcher, I want to verify if my 'best' parameters work on unseen data using Walk-Forward Analysis." - **Analysis:** "As a user, I want to see an equity curve and key metrics (Sharpe, Drawdown) immediately after a test run." ## 4. Functional Requirements ### 4.1 Data Module (`data_manager`) - **Exchange Interface:** Use `ccxt` to connect to exchanges (initially OKX). - **Fetching Logic:** Fetch OHLCV data in chunks to handle rate limits and long histories. - **Storage:** Save data to standardized paths: `data/ccxt/{exchange}/{pair}_{timeframe}.csv`. - **Loading:** Utility to load saved CSVs into a Pandas DataFrame compatible with `vectorbt`. ### 4.2 Strategy Interface (`strategies/`) - **Base Protocol:** Define a standard structure for strategies. A strategy should return/define: - Indicator calculations (Vectorized). - Entry signals (Boolean Series). - Exit signals (Boolean Series). - **Parameterization:** Strategies must accept dynamic parameters to support Grid Search. ### 4.3 Backtest Engine (`engine.py`) - **Simulation:** Use `vectorbt.Portfolio.from_signals` (or similar) for fast simulation. - **Cost Model:** Support configurable fees (maker/taker) and slippage estimates. - **Grid Search:** Utilize `vectorbt`'s parameter broadcasting to run many variations simultaneously. - **Walk-Forward Analysis:** - Implement a splitting mechanism (e.g., `vectorbt.Splitter`) to divide data into In-Sample (Train) and Out-of-Sample (Test) sets. - Execute optimization on Train, validate on Test. ### 4.4 Reporting (`reporting.py`) - **Console:** Print key metrics: Total Return, Sharpe Ratio, Max Drawdown, Win Rate, Count of Trades. - **Files:** Save detailed trade logs and metrics summaries to `backtest_logs/`. - **Visuals:** Generate and save/show `vectorbt` plots (Equity curve, Drawdowns). ## 5. Non-Goals - Real-time live trading execution (this is strictly for research/backtesting). - Complex Machine Learning models (initially focusing on indicator-based logic). - High-frequency tick-level backtesting (1-minute granularity is the target). ## 6. Technical Architecture Proposal ```text project_root/ ├── data/ │ └── ccxt/ # New data storage structure ├── strategies/ # Strategy definitions │ ├── __init__.py │ ├── base.py # Abstract Base Class │ └── ma_cross.py # Example strategy ├── engine/ │ ├── data_loader.py # CCXT wrapper │ ├── backtester.py # VBT runner │ └── optimizer.py # Grid Search & WFA logic ├── main.py # CLI entry point └── pyproject.toml ``` ## 7. Success Metrics - Can download 1 year of 1m BTC/USDT data from OKX in < 2 minutes. - Can run a 100-parameter grid search on 1 year of 1m data in < 10 seconds. - Walk-forward analysis produces a clear "Robustness Score" or visual comparison of Train vs Test performance. ## 8. Open Questions - Do we need to handle funding rates for perp futures in the PnL calculation immediately? (Assumed NO for V1, stick to spot/simple futures price action).