# PRD: Multi-Pair Divergence Selection Strategy ## 1. Introduction / Overview This document describes the **Multi-Pair Divergence Selection Strategy**, an extension of the existing BTC/ETH regime reversion system. The strategy expands spread analysis to the **top 10 cryptocurrencies by market cap**, calculates divergence scores for all tradeable pairs, and dynamically selects the **most divergent pair** for trading. The core hypothesis: by scanning multiple pairs simultaneously, we can identify stronger mean-reversion opportunities than focusing on a single pair, improving net PnL while maintaining the proven ML-based regime detection approach. --- ## 2. Goals 1. **Extend regime detection** to top 10 market cap cryptocurrencies 2. **Dynamically select** the most divergent tradeable pair each cycle 3. **Integrate volatility** into dynamic SL/TP calculations 4. **Filter correlated pairs** to avoid redundant positions 5. **Improve net PnL** compared to single-pair BTC/ETH strategy 6. **Backtest-first** implementation with walk-forward validation --- ## 3. User Stories ### US-1: Multi-Pair Analysis > As a trader, I want the system to analyze spread divergence across multiple cryptocurrency pairs so that I can identify the best trading opportunity at any given moment. ### US-2: Dynamic Pair Selection > As a trader, I want the system to automatically select and trade the pair with the highest divergence score (combination of Z-score magnitude and ML probability) so that I maximize mean-reversion profit potential. ### US-3: Volatility-Adjusted Risk > As a trader, I want stop-loss and take-profit levels to adapt to each pair's volatility so that I avoid being stopped out prematurely on volatile assets while protecting profits on stable ones. ### US-4: Correlation Filtering > As a trader, I want the system to avoid selecting pairs that are highly correlated with my current position so that I don't inadvertently double-down on the same market exposure. ### US-5: Backtest Validation > As a researcher, I want to backtest this multi-pair strategy with walk-forward training so that I can validate improvement over the single-pair baseline without look-ahead bias. --- ## 4. Functional Requirements ### 4.1 Data Management | ID | Requirement | |----|-------------| | FR-1.1 | System must support loading OHLCV data for top 10 market cap cryptocurrencies | | FR-1.2 | Target assets: BTC, ETH, SOL, XRP, BNB, DOGE, ADA, AVAX, LINK, DOT (configurable) | | FR-1.3 | System must identify all directly tradeable cross-pairs on OKX perpetuals | | FR-1.4 | System must align timestamps across all pairs for synchronized analysis | | FR-1.5 | System must handle missing data gracefully (skip pair if insufficient history) | ### 4.2 Pair Generation | ID | Requirement | |----|-------------| | FR-2.1 | Generate all unique pairs from asset universe: N*(N-1)/2 pairs (e.g., 45 pairs for 10 assets) | | FR-2.2 | Filter pairs to only those directly tradeable on OKX (no USDT intermediate) | | FR-2.3 | Fallback: If cross-pair not available, calculate synthetic spread via USDT pairs | | FR-2.4 | Store pair metadata: base asset, quote asset, exchange symbol, tradeable flag | ### 4.3 Feature Engineering (Per Pair) | ID | Requirement | |----|-------------| | FR-3.1 | Calculate spread ratio: `asset_a_close / asset_b_close` | | FR-3.2 | Calculate Z-Score with configurable rolling window (default: 24h) | | FR-3.3 | Calculate spread technicals: RSI(14), ROC(5), 1h change | | FR-3.4 | Calculate volume ratio and relative volume | | FR-3.5 | Calculate volatility ratio: `std(returns_a) / std(returns_b)` over Z-window | | FR-3.6 | Calculate realized volatility for each asset (for dynamic SL/TP) | | FR-3.7 | Merge on-chain data (funding rates, inflows) if available per asset | | FR-3.8 | Add pair identifier as categorical feature for universal model | ### 4.4 Correlation Filtering | ID | Requirement | |----|-------------| | FR-4.1 | Calculate rolling correlation matrix between all assets (default: 168h / 7 days) | | FR-4.2 | Define correlation threshold (default: 0.85) | | FR-4.3 | If current position exists, exclude pairs where either asset has correlation > threshold with held asset | | FR-4.4 | Log filtered pairs with reason for exclusion | ### 4.5 Divergence Scoring & Pair Selection | ID | Requirement | |----|-------------| | FR-5.1 | Calculate divergence score: `abs(z_score) * model_probability` | | FR-5.2 | Only consider pairs where `abs(z_score) > z_entry_threshold` (default: 1.0) | | FR-5.3 | Only consider pairs where `model_probability > prob_threshold` (default: 0.5) | | FR-5.4 | Apply correlation filter to eligible pairs | | FR-5.5 | Select pair with highest divergence score | | FR-5.6 | If no pair qualifies, signal "hold" | | FR-5.7 | Log all pair scores for analysis/debugging | ### 4.6 ML Model (Universal) | ID | Requirement | |----|-------------| | FR-6.1 | Train single Random Forest model on all pairs combined | | FR-6.2 | Include `pair_id` as one-hot encoded or label-encoded feature | | FR-6.3 | Target: binary (1 = profitable reversion within horizon, 0 = no reversion) | | FR-6.4 | Walk-forward training: 70% train / 30% test split | | FR-6.5 | Daily retraining schedule (for live, configurable for backtest) | | FR-6.6 | Model hyperparameters: `n_estimators=300, max_depth=5, min_samples_leaf=30, class_weight={0:1, 1:3}` | | FR-6.7 | Save/load model with feature column metadata | ### 4.7 Signal Generation | ID | Requirement | |----|-------------| | FR-7.1 | Direction: If `z_score > threshold` -> Short spread (short asset_a), If `z_score < -threshold` -> Long spread (long asset_a) | | FR-7.2 | Apply funding rate filter per asset (block if extreme funding opposes direction) | | FR-7.3 | Output signal: `{pair, action, side, probability, z_score, divergence_score, reason}` | ### 4.8 Position Sizing | ID | Requirement | |----|-------------| | FR-8.1 | Base size: 100% of available subaccount balance | | FR-8.2 | Scale by divergence: `size_multiplier = 1.0 + (divergence_score - base_threshold) * scaling_factor` | | FR-8.3 | Cap multiplier between 1.0x and 2.0x | | FR-8.4 | Respect exchange minimum order size per asset | ### 4.9 Dynamic SL/TP (Volatility-Adjusted) | ID | Requirement | |----|-------------| | FR-9.1 | Calculate asset realized volatility: `std(returns) * sqrt(24)` for daily vol | | FR-9.2 | Base SL: `entry_price * (1 - base_sl_pct * vol_multiplier)` for longs | | FR-9.3 | Base TP: `entry_price * (1 + base_tp_pct * vol_multiplier)` for longs | | FR-9.4 | `vol_multiplier = asset_volatility / baseline_volatility` (baseline = BTC volatility) | | FR-9.5 | Cap vol_multiplier between 0.5x and 2.0x to prevent extreme values | | FR-9.6 | Invert logic for short positions | ### 4.10 Exit Conditions | ID | Requirement | |----|-------------| | FR-10.1 | Exit when Z-score crosses back through 0 (mean reversion complete) | | FR-10.2 | Exit when dynamic SL or TP hit | | FR-10.3 | No minimum holding period (can switch pairs immediately) | | FR-10.4 | If new pair has higher divergence score, close current and open new | ### 4.11 Backtest Integration | ID | Requirement | |----|-------------| | FR-11.1 | Integrate with existing `engine/backtester.py` framework | | FR-11.2 | Support 1h timeframe (matching live trading) | | FR-11.3 | Walk-forward validation: train on 70%, test on 30% | | FR-11.4 | Output: trades log, equity curve, performance metrics | | FR-11.5 | Compare against single-pair BTC/ETH baseline | --- ## 5. Non-Goals (Out of Scope) 1. **Live trading implementation** - Backtest validation first 2. **Multi-position portfolio** - Single pair at a time for v1 3. **Cross-exchange arbitrage** - OKX only 4. **Alternative ML models** - Stick with Random Forest for consistency 5. **Sub-1h timeframes** - 1h candles only for initial version 6. **Leveraged positions** - 1x leverage for backtest 7. **Portfolio-level VaR/risk budgeting** - Full subaccount allocation --- ## 6. Design Considerations ### 6.1 Architecture ``` strategies/ multi_pair/ __init__.py pair_scanner.py # Generates all pairs, filters tradeable feature_engine.py # Calculates features for all pairs correlation.py # Rolling correlation matrix & filtering divergence_scorer.py # Ranks pairs by divergence score strategy.py # Main strategy orchestration ``` ### 6.2 Data Flow ``` 1. Load OHLCV for all 10 assets 2. Generate pair combinations (45 pairs) 3. Filter to tradeable pairs (OKX check) 4. Calculate features for each pair 5. Train/load universal ML model 6. Predict probability for all pairs 7. Calculate divergence scores 8. Apply correlation filter 9. Select top pair 10. Generate signal with dynamic SL/TP 11. Execute in backtest engine ``` ### 6.3 Configuration ```python @dataclass class MultiPairConfig: # Assets assets: list[str] = field(default_factory=lambda: [ "BTC", "ETH", "SOL", "XRP", "BNB", "DOGE", "ADA", "AVAX", "LINK", "DOT" ]) # Thresholds z_window: int = 24 z_entry_threshold: float = 1.0 prob_threshold: float = 0.5 correlation_threshold: float = 0.85 correlation_window: int = 168 # 7 days in hours # Risk base_sl_pct: float = 0.06 base_tp_pct: float = 0.05 vol_multiplier_min: float = 0.5 vol_multiplier_max: float = 2.0 # Model train_ratio: float = 0.7 horizon: int = 102 profit_target: float = 0.005 ``` --- ## 7. Technical Considerations ### 7.1 Dependencies - Extend `DataManager` to load multiple symbols - Query OKX API for available perpetual cross-pairs - Reuse existing feature engineering from `RegimeReversionStrategy` ### 7.2 Performance - Pre-calculate all pair features in batch (vectorized) - Cache correlation matrix (update every N candles, not every minute) - Model inference is fast (single predict call with all pairs as rows) ### 7.3 Edge Cases - Handle pairs with insufficient history (< 200 bars) - exclude - Handle assets delisted mid-backtest - skip pair - Handle zero-volume periods - use last valid price --- ## 8. Success Metrics | Metric | Baseline (BTC/ETH) | Target | |--------|-------------------|--------| | Net PnL | Current performance | > 10% improvement | | Number of Trades | N | Comparable or higher | | Win Rate | Baseline % | Maintain or improve | | Average Trade Duration | Baseline hours | Flexible | | Max Drawdown | Baseline % | Not significantly worse | --- ## 9. Open Questions 1. **OKX Cross-Pairs**: Need to verify which cross-pairs are available on OKX perpetuals. May need to fallback to synthetic spreads for most pairs. 2. **On-Chain Data**: CryptoQuant data currently covers BTC/ETH. Should we: - Run without on-chain features for other assets? - Source alternative on-chain data? - Use funding rates only (available from OKX)? 3. **Pair ID Encoding**: For the universal model, should pair_id be: - One-hot encoded (adds 45 features)? - Label encoded (single ordinal feature)? - Hierarchical (base_asset + quote_asset as separate features)? 4. **Synthetic Spreads**: If trading SOL/DOT spread but only USDT pairs available: - Calculate spread synthetically: `SOL-USDT / DOT-USDT` - Execute as two legs: Long SOL-USDT, Short DOT-USDT - This doubles fees and adds execution complexity. Include in v1? --- ## 10. Implementation Phases ### Phase 1: Data & Infrastructure (Est. 2-3 days) - Extend DataManager for multi-symbol loading - Build pair scanner with OKX tradeable filter - Implement correlation matrix calculation ### Phase 2: Feature Engineering (Est. 2 days) - Adapt existing feature calculation for arbitrary pairs - Add pair identifier feature - Batch feature calculation for all pairs ### Phase 3: Model & Scoring (Est. 2 days) - Train universal model on all pairs - Implement divergence scoring - Add correlation filtering to pair selection ### Phase 4: Strategy Integration (Est. 2-3 days) - Implement dynamic SL/TP with volatility - Integrate with backtester - Build strategy orchestration class ### Phase 5: Validation & Comparison (Est. 2 days) - Run walk-forward backtest - Compare against BTC/ETH baseline - Generate performance report **Total Estimated Effort: 10-12 days** --- *Document Version: 1.0* *Created: 2026-01-15* *Author: AI Assistant* *Status: Draft - Awaiting Review*