- Extend regime detection to top 10 cryptocurrencies (45 pairs) - Dynamic pair selection based on divergence score (|z_score| * probability) - Universal ML model trained on all pairs - Correlation-based filtering to avoid redundant positions - Funding rate integration from OKX for all 10 assets - ATR-based dynamic stop-loss and take-profit - Walk-forward training with 70/30 split Performance: +35.69% return (vs +28.66% baseline), 63.6% win rate
12 KiB
PRD: Multi-Pair Divergence Selection Strategy
1. Introduction / Overview
This document describes the Multi-Pair Divergence Selection Strategy, an extension of the existing BTC/ETH regime reversion system. The strategy expands spread analysis to the top 10 cryptocurrencies by market cap, calculates divergence scores for all tradeable pairs, and dynamically selects the most divergent pair for trading.
The core hypothesis: by scanning multiple pairs simultaneously, we can identify stronger mean-reversion opportunities than focusing on a single pair, improving net PnL while maintaining the proven ML-based regime detection approach.
2. Goals
- Extend regime detection to top 10 market cap cryptocurrencies
- Dynamically select the most divergent tradeable pair each cycle
- Integrate volatility into dynamic SL/TP calculations
- Filter correlated pairs to avoid redundant positions
- Improve net PnL compared to single-pair BTC/ETH strategy
- Backtest-first implementation with walk-forward validation
3. User Stories
US-1: Multi-Pair Analysis
As a trader, I want the system to analyze spread divergence across multiple cryptocurrency pairs so that I can identify the best trading opportunity at any given moment.
US-2: Dynamic Pair Selection
As a trader, I want the system to automatically select and trade the pair with the highest divergence score (combination of Z-score magnitude and ML probability) so that I maximize mean-reversion profit potential.
US-3: Volatility-Adjusted Risk
As a trader, I want stop-loss and take-profit levels to adapt to each pair's volatility so that I avoid being stopped out prematurely on volatile assets while protecting profits on stable ones.
US-4: Correlation Filtering
As a trader, I want the system to avoid selecting pairs that are highly correlated with my current position so that I don't inadvertently double-down on the same market exposure.
US-5: Backtest Validation
As a researcher, I want to backtest this multi-pair strategy with walk-forward training so that I can validate improvement over the single-pair baseline without look-ahead bias.
4. Functional Requirements
4.1 Data Management
| ID | Requirement |
|---|---|
| FR-1.1 | System must support loading OHLCV data for top 10 market cap cryptocurrencies |
| FR-1.2 | Target assets: BTC, ETH, SOL, XRP, BNB, DOGE, ADA, AVAX, LINK, DOT (configurable) |
| FR-1.3 | System must identify all directly tradeable cross-pairs on OKX perpetuals |
| FR-1.4 | System must align timestamps across all pairs for synchronized analysis |
| FR-1.5 | System must handle missing data gracefully (skip pair if insufficient history) |
4.2 Pair Generation
| ID | Requirement |
|---|---|
| FR-2.1 | Generate all unique pairs from asset universe: N*(N-1)/2 pairs (e.g., 45 pairs for 10 assets) |
| FR-2.2 | Filter pairs to only those directly tradeable on OKX (no USDT intermediate) |
| FR-2.3 | Fallback: If cross-pair not available, calculate synthetic spread via USDT pairs |
| FR-2.4 | Store pair metadata: base asset, quote asset, exchange symbol, tradeable flag |
4.3 Feature Engineering (Per Pair)
| ID | Requirement |
|---|---|
| FR-3.1 | Calculate spread ratio: asset_a_close / asset_b_close |
| FR-3.2 | Calculate Z-Score with configurable rolling window (default: 24h) |
| FR-3.3 | Calculate spread technicals: RSI(14), ROC(5), 1h change |
| FR-3.4 | Calculate volume ratio and relative volume |
| FR-3.5 | Calculate volatility ratio: std(returns_a) / std(returns_b) over Z-window |
| FR-3.6 | Calculate realized volatility for each asset (for dynamic SL/TP) |
| FR-3.7 | Merge on-chain data (funding rates, inflows) if available per asset |
| FR-3.8 | Add pair identifier as categorical feature for universal model |
4.4 Correlation Filtering
| ID | Requirement |
|---|---|
| FR-4.1 | Calculate rolling correlation matrix between all assets (default: 168h / 7 days) |
| FR-4.2 | Define correlation threshold (default: 0.85) |
| FR-4.3 | If current position exists, exclude pairs where either asset has correlation > threshold with held asset |
| FR-4.4 | Log filtered pairs with reason for exclusion |
4.5 Divergence Scoring & Pair Selection
| ID | Requirement |
|---|---|
| FR-5.1 | Calculate divergence score: abs(z_score) * model_probability |
| FR-5.2 | Only consider pairs where abs(z_score) > z_entry_threshold (default: 1.0) |
| FR-5.3 | Only consider pairs where model_probability > prob_threshold (default: 0.5) |
| FR-5.4 | Apply correlation filter to eligible pairs |
| FR-5.5 | Select pair with highest divergence score |
| FR-5.6 | If no pair qualifies, signal "hold" |
| FR-5.7 | Log all pair scores for analysis/debugging |
4.6 ML Model (Universal)
| ID | Requirement |
|---|---|
| FR-6.1 | Train single Random Forest model on all pairs combined |
| FR-6.2 | Include pair_id as one-hot encoded or label-encoded feature |
| FR-6.3 | Target: binary (1 = profitable reversion within horizon, 0 = no reversion) |
| FR-6.4 | Walk-forward training: 70% train / 30% test split |
| FR-6.5 | Daily retraining schedule (for live, configurable for backtest) |
| FR-6.6 | Model hyperparameters: n_estimators=300, max_depth=5, min_samples_leaf=30, class_weight={0:1, 1:3} |
| FR-6.7 | Save/load model with feature column metadata |
4.7 Signal Generation
| ID | Requirement |
|---|---|
| FR-7.1 | Direction: If z_score > threshold -> Short spread (short asset_a), If z_score < -threshold -> Long spread (long asset_a) |
| FR-7.2 | Apply funding rate filter per asset (block if extreme funding opposes direction) |
| FR-7.3 | Output signal: {pair, action, side, probability, z_score, divergence_score, reason} |
4.8 Position Sizing
| ID | Requirement |
|---|---|
| FR-8.1 | Base size: 100% of available subaccount balance |
| FR-8.2 | Scale by divergence: size_multiplier = 1.0 + (divergence_score - base_threshold) * scaling_factor |
| FR-8.3 | Cap multiplier between 1.0x and 2.0x |
| FR-8.4 | Respect exchange minimum order size per asset |
4.9 Dynamic SL/TP (Volatility-Adjusted)
| ID | Requirement |
|---|---|
| FR-9.1 | Calculate asset realized volatility: std(returns) * sqrt(24) for daily vol |
| FR-9.2 | Base SL: entry_price * (1 - base_sl_pct * vol_multiplier) for longs |
| FR-9.3 | Base TP: entry_price * (1 + base_tp_pct * vol_multiplier) for longs |
| FR-9.4 | vol_multiplier = asset_volatility / baseline_volatility (baseline = BTC volatility) |
| FR-9.5 | Cap vol_multiplier between 0.5x and 2.0x to prevent extreme values |
| FR-9.6 | Invert logic for short positions |
4.10 Exit Conditions
| ID | Requirement |
|---|---|
| FR-10.1 | Exit when Z-score crosses back through 0 (mean reversion complete) |
| FR-10.2 | Exit when dynamic SL or TP hit |
| FR-10.3 | No minimum holding period (can switch pairs immediately) |
| FR-10.4 | If new pair has higher divergence score, close current and open new |
4.11 Backtest Integration
| ID | Requirement |
|---|---|
| FR-11.1 | Integrate with existing engine/backtester.py framework |
| FR-11.2 | Support 1h timeframe (matching live trading) |
| FR-11.3 | Walk-forward validation: train on 70%, test on 30% |
| FR-11.4 | Output: trades log, equity curve, performance metrics |
| FR-11.5 | Compare against single-pair BTC/ETH baseline |
5. Non-Goals (Out of Scope)
- Live trading implementation - Backtest validation first
- Multi-position portfolio - Single pair at a time for v1
- Cross-exchange arbitrage - OKX only
- Alternative ML models - Stick with Random Forest for consistency
- Sub-1h timeframes - 1h candles only for initial version
- Leveraged positions - 1x leverage for backtest
- Portfolio-level VaR/risk budgeting - Full subaccount allocation
6. Design Considerations
6.1 Architecture
strategies/
multi_pair/
__init__.py
pair_scanner.py # Generates all pairs, filters tradeable
feature_engine.py # Calculates features for all pairs
correlation.py # Rolling correlation matrix & filtering
divergence_scorer.py # Ranks pairs by divergence score
strategy.py # Main strategy orchestration
6.2 Data Flow
1. Load OHLCV for all 10 assets
2. Generate pair combinations (45 pairs)
3. Filter to tradeable pairs (OKX check)
4. Calculate features for each pair
5. Train/load universal ML model
6. Predict probability for all pairs
7. Calculate divergence scores
8. Apply correlation filter
9. Select top pair
10. Generate signal with dynamic SL/TP
11. Execute in backtest engine
6.3 Configuration
@dataclass
class MultiPairConfig:
# Assets
assets: list[str] = field(default_factory=lambda: [
"BTC", "ETH", "SOL", "XRP", "BNB",
"DOGE", "ADA", "AVAX", "LINK", "DOT"
])
# Thresholds
z_window: int = 24
z_entry_threshold: float = 1.0
prob_threshold: float = 0.5
correlation_threshold: float = 0.85
correlation_window: int = 168 # 7 days in hours
# Risk
base_sl_pct: float = 0.06
base_tp_pct: float = 0.05
vol_multiplier_min: float = 0.5
vol_multiplier_max: float = 2.0
# Model
train_ratio: float = 0.7
horizon: int = 102
profit_target: float = 0.005
7. Technical Considerations
7.1 Dependencies
- Extend
DataManagerto load multiple symbols - Query OKX API for available perpetual cross-pairs
- Reuse existing feature engineering from
RegimeReversionStrategy
7.2 Performance
- Pre-calculate all pair features in batch (vectorized)
- Cache correlation matrix (update every N candles, not every minute)
- Model inference is fast (single predict call with all pairs as rows)
7.3 Edge Cases
- Handle pairs with insufficient history (< 200 bars) - exclude
- Handle assets delisted mid-backtest - skip pair
- Handle zero-volume periods - use last valid price
8. Success Metrics
| Metric | Baseline (BTC/ETH) | Target |
|---|---|---|
| Net PnL | Current performance | > 10% improvement |
| Number of Trades | N | Comparable or higher |
| Win Rate | Baseline % | Maintain or improve |
| Average Trade Duration | Baseline hours | Flexible |
| Max Drawdown | Baseline % | Not significantly worse |
9. Open Questions
-
OKX Cross-Pairs: Need to verify which cross-pairs are available on OKX perpetuals. May need to fallback to synthetic spreads for most pairs.
-
On-Chain Data: CryptoQuant data currently covers BTC/ETH. Should we:
- Run without on-chain features for other assets?
- Source alternative on-chain data?
- Use funding rates only (available from OKX)?
-
Pair ID Encoding: For the universal model, should pair_id be:
- One-hot encoded (adds 45 features)?
- Label encoded (single ordinal feature)?
- Hierarchical (base_asset + quote_asset as separate features)?
-
Synthetic Spreads: If trading SOL/DOT spread but only USDT pairs available:
- Calculate spread synthetically:
SOL-USDT / DOT-USDT - Execute as two legs: Long SOL-USDT, Short DOT-USDT
- This doubles fees and adds execution complexity. Include in v1?
- Calculate spread synthetically:
10. Implementation Phases
Phase 1: Data & Infrastructure (Est. 2-3 days)
- Extend DataManager for multi-symbol loading
- Build pair scanner with OKX tradeable filter
- Implement correlation matrix calculation
Phase 2: Feature Engineering (Est. 2 days)
- Adapt existing feature calculation for arbitrary pairs
- Add pair identifier feature
- Batch feature calculation for all pairs
Phase 3: Model & Scoring (Est. 2 days)
- Train universal model on all pairs
- Implement divergence scoring
- Add correlation filtering to pair selection
Phase 4: Strategy Integration (Est. 2-3 days)
- Implement dynamic SL/TP with volatility
- Integrate with backtester
- Build strategy orchestration class
Phase 5: Validation & Comparison (Est. 2 days)
- Run walk-forward backtest
- Compare against BTC/ETH baseline
- Generate performance report
Total Estimated Effort: 10-12 days
Document Version: 1.0
Created: 2026-01-15
Author: AI Assistant
Status: Draft - Awaiting Review