Files
lowkey_backtest/tasks/prd-multi-pair-divergence-strategy.md
Simon Moisy df37366603 feat: Multi-Pair Divergence Selection Strategy
- Extend regime detection to top 10 cryptocurrencies (45 pairs)
- Dynamic pair selection based on divergence score (|z_score| * probability)
- Universal ML model trained on all pairs
- Correlation-based filtering to avoid redundant positions
- Funding rate integration from OKX for all 10 assets
- ATR-based dynamic stop-loss and take-profit
- Walk-forward training with 70/30 split

Performance: +35.69% return (vs +28.66% baseline), 63.6% win rate
2026-01-15 20:47:23 +08:00

12 KiB

PRD: Multi-Pair Divergence Selection Strategy

1. Introduction / Overview

This document describes the Multi-Pair Divergence Selection Strategy, an extension of the existing BTC/ETH regime reversion system. The strategy expands spread analysis to the top 10 cryptocurrencies by market cap, calculates divergence scores for all tradeable pairs, and dynamically selects the most divergent pair for trading.

The core hypothesis: by scanning multiple pairs simultaneously, we can identify stronger mean-reversion opportunities than focusing on a single pair, improving net PnL while maintaining the proven ML-based regime detection approach.


2. Goals

  1. Extend regime detection to top 10 market cap cryptocurrencies
  2. Dynamically select the most divergent tradeable pair each cycle
  3. Integrate volatility into dynamic SL/TP calculations
  4. Filter correlated pairs to avoid redundant positions
  5. Improve net PnL compared to single-pair BTC/ETH strategy
  6. Backtest-first implementation with walk-forward validation

3. User Stories

US-1: Multi-Pair Analysis

As a trader, I want the system to analyze spread divergence across multiple cryptocurrency pairs so that I can identify the best trading opportunity at any given moment.

US-2: Dynamic Pair Selection

As a trader, I want the system to automatically select and trade the pair with the highest divergence score (combination of Z-score magnitude and ML probability) so that I maximize mean-reversion profit potential.

US-3: Volatility-Adjusted Risk

As a trader, I want stop-loss and take-profit levels to adapt to each pair's volatility so that I avoid being stopped out prematurely on volatile assets while protecting profits on stable ones.

US-4: Correlation Filtering

As a trader, I want the system to avoid selecting pairs that are highly correlated with my current position so that I don't inadvertently double-down on the same market exposure.

US-5: Backtest Validation

As a researcher, I want to backtest this multi-pair strategy with walk-forward training so that I can validate improvement over the single-pair baseline without look-ahead bias.


4. Functional Requirements

4.1 Data Management

ID Requirement
FR-1.1 System must support loading OHLCV data for top 10 market cap cryptocurrencies
FR-1.2 Target assets: BTC, ETH, SOL, XRP, BNB, DOGE, ADA, AVAX, LINK, DOT (configurable)
FR-1.3 System must identify all directly tradeable cross-pairs on OKX perpetuals
FR-1.4 System must align timestamps across all pairs for synchronized analysis
FR-1.5 System must handle missing data gracefully (skip pair if insufficient history)

4.2 Pair Generation

ID Requirement
FR-2.1 Generate all unique pairs from asset universe: N*(N-1)/2 pairs (e.g., 45 pairs for 10 assets)
FR-2.2 Filter pairs to only those directly tradeable on OKX (no USDT intermediate)
FR-2.3 Fallback: If cross-pair not available, calculate synthetic spread via USDT pairs
FR-2.4 Store pair metadata: base asset, quote asset, exchange symbol, tradeable flag

4.3 Feature Engineering (Per Pair)

ID Requirement
FR-3.1 Calculate spread ratio: asset_a_close / asset_b_close
FR-3.2 Calculate Z-Score with configurable rolling window (default: 24h)
FR-3.3 Calculate spread technicals: RSI(14), ROC(5), 1h change
FR-3.4 Calculate volume ratio and relative volume
FR-3.5 Calculate volatility ratio: std(returns_a) / std(returns_b) over Z-window
FR-3.6 Calculate realized volatility for each asset (for dynamic SL/TP)
FR-3.7 Merge on-chain data (funding rates, inflows) if available per asset
FR-3.8 Add pair identifier as categorical feature for universal model

4.4 Correlation Filtering

ID Requirement
FR-4.1 Calculate rolling correlation matrix between all assets (default: 168h / 7 days)
FR-4.2 Define correlation threshold (default: 0.85)
FR-4.3 If current position exists, exclude pairs where either asset has correlation > threshold with held asset
FR-4.4 Log filtered pairs with reason for exclusion

4.5 Divergence Scoring & Pair Selection

ID Requirement
FR-5.1 Calculate divergence score: abs(z_score) * model_probability
FR-5.2 Only consider pairs where abs(z_score) > z_entry_threshold (default: 1.0)
FR-5.3 Only consider pairs where model_probability > prob_threshold (default: 0.5)
FR-5.4 Apply correlation filter to eligible pairs
FR-5.5 Select pair with highest divergence score
FR-5.6 If no pair qualifies, signal "hold"
FR-5.7 Log all pair scores for analysis/debugging

4.6 ML Model (Universal)

ID Requirement
FR-6.1 Train single Random Forest model on all pairs combined
FR-6.2 Include pair_id as one-hot encoded or label-encoded feature
FR-6.3 Target: binary (1 = profitable reversion within horizon, 0 = no reversion)
FR-6.4 Walk-forward training: 70% train / 30% test split
FR-6.5 Daily retraining schedule (for live, configurable for backtest)
FR-6.6 Model hyperparameters: n_estimators=300, max_depth=5, min_samples_leaf=30, class_weight={0:1, 1:3}
FR-6.7 Save/load model with feature column metadata

4.7 Signal Generation

ID Requirement
FR-7.1 Direction: If z_score > threshold -> Short spread (short asset_a), If z_score < -threshold -> Long spread (long asset_a)
FR-7.2 Apply funding rate filter per asset (block if extreme funding opposes direction)
FR-7.3 Output signal: {pair, action, side, probability, z_score, divergence_score, reason}

4.8 Position Sizing

ID Requirement
FR-8.1 Base size: 100% of available subaccount balance
FR-8.2 Scale by divergence: size_multiplier = 1.0 + (divergence_score - base_threshold) * scaling_factor
FR-8.3 Cap multiplier between 1.0x and 2.0x
FR-8.4 Respect exchange minimum order size per asset

4.9 Dynamic SL/TP (Volatility-Adjusted)

ID Requirement
FR-9.1 Calculate asset realized volatility: std(returns) * sqrt(24) for daily vol
FR-9.2 Base SL: entry_price * (1 - base_sl_pct * vol_multiplier) for longs
FR-9.3 Base TP: entry_price * (1 + base_tp_pct * vol_multiplier) for longs
FR-9.4 vol_multiplier = asset_volatility / baseline_volatility (baseline = BTC volatility)
FR-9.5 Cap vol_multiplier between 0.5x and 2.0x to prevent extreme values
FR-9.6 Invert logic for short positions

4.10 Exit Conditions

ID Requirement
FR-10.1 Exit when Z-score crosses back through 0 (mean reversion complete)
FR-10.2 Exit when dynamic SL or TP hit
FR-10.3 No minimum holding period (can switch pairs immediately)
FR-10.4 If new pair has higher divergence score, close current and open new

4.11 Backtest Integration

ID Requirement
FR-11.1 Integrate with existing engine/backtester.py framework
FR-11.2 Support 1h timeframe (matching live trading)
FR-11.3 Walk-forward validation: train on 70%, test on 30%
FR-11.4 Output: trades log, equity curve, performance metrics
FR-11.5 Compare against single-pair BTC/ETH baseline

5. Non-Goals (Out of Scope)

  1. Live trading implementation - Backtest validation first
  2. Multi-position portfolio - Single pair at a time for v1
  3. Cross-exchange arbitrage - OKX only
  4. Alternative ML models - Stick with Random Forest for consistency
  5. Sub-1h timeframes - 1h candles only for initial version
  6. Leveraged positions - 1x leverage for backtest
  7. Portfolio-level VaR/risk budgeting - Full subaccount allocation

6. Design Considerations

6.1 Architecture

strategies/
  multi_pair/
    __init__.py
    pair_scanner.py      # Generates all pairs, filters tradeable
    feature_engine.py    # Calculates features for all pairs
    correlation.py       # Rolling correlation matrix & filtering
    divergence_scorer.py # Ranks pairs by divergence score
    strategy.py          # Main strategy orchestration

6.2 Data Flow

1. Load OHLCV for all 10 assets
2. Generate pair combinations (45 pairs)
3. Filter to tradeable pairs (OKX check)
4. Calculate features for each pair
5. Train/load universal ML model
6. Predict probability for all pairs
7. Calculate divergence scores
8. Apply correlation filter
9. Select top pair
10. Generate signal with dynamic SL/TP
11. Execute in backtest engine

6.3 Configuration

@dataclass
class MultiPairConfig:
    # Assets
    assets: list[str] = field(default_factory=lambda: [
        "BTC", "ETH", "SOL", "XRP", "BNB", 
        "DOGE", "ADA", "AVAX", "LINK", "DOT"
    ])
    
    # Thresholds
    z_window: int = 24
    z_entry_threshold: float = 1.0
    prob_threshold: float = 0.5
    correlation_threshold: float = 0.85
    correlation_window: int = 168  # 7 days in hours
    
    # Risk
    base_sl_pct: float = 0.06
    base_tp_pct: float = 0.05
    vol_multiplier_min: float = 0.5
    vol_multiplier_max: float = 2.0
    
    # Model
    train_ratio: float = 0.7
    horizon: int = 102
    profit_target: float = 0.005

7. Technical Considerations

7.1 Dependencies

  • Extend DataManager to load multiple symbols
  • Query OKX API for available perpetual cross-pairs
  • Reuse existing feature engineering from RegimeReversionStrategy

7.2 Performance

  • Pre-calculate all pair features in batch (vectorized)
  • Cache correlation matrix (update every N candles, not every minute)
  • Model inference is fast (single predict call with all pairs as rows)

7.3 Edge Cases

  • Handle pairs with insufficient history (< 200 bars) - exclude
  • Handle assets delisted mid-backtest - skip pair
  • Handle zero-volume periods - use last valid price

8. Success Metrics

Metric Baseline (BTC/ETH) Target
Net PnL Current performance > 10% improvement
Number of Trades N Comparable or higher
Win Rate Baseline % Maintain or improve
Average Trade Duration Baseline hours Flexible
Max Drawdown Baseline % Not significantly worse

9. Open Questions

  1. OKX Cross-Pairs: Need to verify which cross-pairs are available on OKX perpetuals. May need to fallback to synthetic spreads for most pairs.

  2. On-Chain Data: CryptoQuant data currently covers BTC/ETH. Should we:

    • Run without on-chain features for other assets?
    • Source alternative on-chain data?
    • Use funding rates only (available from OKX)?
  3. Pair ID Encoding: For the universal model, should pair_id be:

    • One-hot encoded (adds 45 features)?
    • Label encoded (single ordinal feature)?
    • Hierarchical (base_asset + quote_asset as separate features)?
  4. Synthetic Spreads: If trading SOL/DOT spread but only USDT pairs available:

    • Calculate spread synthetically: SOL-USDT / DOT-USDT
    • Execute as two legs: Long SOL-USDT, Short DOT-USDT
    • This doubles fees and adds execution complexity. Include in v1?

10. Implementation Phases

Phase 1: Data & Infrastructure (Est. 2-3 days)

  • Extend DataManager for multi-symbol loading
  • Build pair scanner with OKX tradeable filter
  • Implement correlation matrix calculation

Phase 2: Feature Engineering (Est. 2 days)

  • Adapt existing feature calculation for arbitrary pairs
  • Add pair identifier feature
  • Batch feature calculation for all pairs

Phase 3: Model & Scoring (Est. 2 days)

  • Train universal model on all pairs
  • Implement divergence scoring
  • Add correlation filtering to pair selection

Phase 4: Strategy Integration (Est. 2-3 days)

  • Implement dynamic SL/TP with volatility
  • Integrate with backtester
  • Build strategy orchestration class

Phase 5: Validation & Comparison (Est. 2 days)

  • Run walk-forward backtest
  • Compare against BTC/ETH baseline
  • Generate performance report

Total Estimated Effort: 10-12 days


Document Version: 1.0
Created: 2026-01-15
Author: AI Assistant
Status: Draft - Awaiting Review