Files
lowkey_backtest/tasks/prd-vectorbt-migration.md

4.7 KiB

PRD: VectorBT Migration & CCXT Integration

1. Introduction

The goal of this project is to refactor the current backtesting infrastructure to a professional-grade stack using VectorBT for high-performance backtesting and CCXT for robust historical data acquisition. The system will support rapid prototyping of "many simple strategies," parameter optimization (Grid Search), and stability testing (Walk-Forward Analysis).

2. Goals

  • Replace Custom Backtester: Retire the existing loop-based backtesting logic in favor of vectorized operations using vectorbt.
  • Automate Data Collection: Implement a ccxt based downloader to fetch and cache OHLCV data from OKX (and other exchanges) automatically.
  • Enable Optimization: Built-in support for Grid Search to find optimal strategy parameters.
  • Validation: Implement Walk-Forward Analysis (WFA) to validate strategy robustness and prevent overfitting.
  • Standardized Reporting: Generate consistent outputs: Console summaries, CSV logs, and VectorBT interactive plots.

3. User Stories

  • Data Acquisition: "As a user, I want to run a command download_data --pair BTC/USDT --exchange okx and have the system fetch historical 1-minute candles and save them to data/ccxt/okx/BTC-USDT/1m.csv."
  • Strategy Dev: "As a researcher, I want to define a new strategy by simply writing a class/function that defines entry/exit signals, without worrying about the backtesting loop."
  • Optimization: "As a researcher, I want to say 'Optimize RSI period between 10 and 20' and get a heatmap of results."
  • Validation: "As a researcher, I want to verify if my 'best' parameters work on unseen data using Walk-Forward Analysis."
  • Analysis: "As a user, I want to see an equity curve and key metrics (Sharpe, Drawdown) immediately after a test run."

4. Functional Requirements

4.1 Data Module (data_manager)

  • Exchange Interface: Use ccxt to connect to exchanges (initially OKX).
  • Fetching Logic: Fetch OHLCV data in chunks to handle rate limits and long histories.
  • Storage: Save data to standardized paths: data/ccxt/{exchange}/{pair}_{timeframe}.csv.
  • Loading: Utility to load saved CSVs into a Pandas DataFrame compatible with vectorbt.

4.2 Strategy Interface (strategies/)

  • Base Protocol: Define a standard structure for strategies. A strategy should return/define:
    • Indicator calculations (Vectorized).
    • Entry signals (Boolean Series).
    • Exit signals (Boolean Series).
  • Parameterization: Strategies must accept dynamic parameters to support Grid Search.

4.3 Backtest Engine (engine.py)

  • Simulation: Use vectorbt.Portfolio.from_signals (or similar) for fast simulation.
  • Cost Model: Support configurable fees (maker/taker) and slippage estimates.
  • Grid Search: Utilize vectorbt's parameter broadcasting to run many variations simultaneously.
  • Walk-Forward Analysis:
    • Implement a splitting mechanism (e.g., vectorbt.Splitter) to divide data into In-Sample (Train) and Out-of-Sample (Test) sets.
    • Execute optimization on Train, validate on Test.

4.4 Reporting (reporting.py)

  • Console: Print key metrics: Total Return, Sharpe Ratio, Max Drawdown, Win Rate, Count of Trades.
  • Files: Save detailed trade logs and metrics summaries to backtest_logs/.
  • Visuals: Generate and save/show vectorbt plots (Equity curve, Drawdowns).

5. Non-Goals

  • Real-time live trading execution (this is strictly for research/backtesting).
  • Complex Machine Learning models (initially focusing on indicator-based logic).
  • High-frequency tick-level backtesting (1-minute granularity is the target).

6. Technical Architecture Proposal

project_root/
├── data/
│   └── ccxt/               # New data storage structure
├── strategies/             # Strategy definitions
│   ├── __init__.py
│   ├── base.py             # Abstract Base Class
│   └── ma_cross.py         # Example strategy
├── engine/
│   ├── data_loader.py      # CCXT wrapper
│   ├── backtester.py       # VBT runner
│   └── optimizer.py        # Grid Search & WFA logic
├── main.py                 # CLI entry point
└── pyproject.toml

7. Success Metrics

  • Can download 1 year of 1m BTC/USDT data from OKX in < 2 minutes.
  • Can run a 100-parameter grid search on 1 year of 1m data in < 10 seconds.
  • Walk-forward analysis produces a clear "Robustness Score" or visual comparison of Train vs Test performance.

8. Open Questions

  • Do we need to handle funding rates for perp futures in the PnL calculation immediately? (Assumed NO for V1, stick to spot/simple futures price action).