Files

Simon Moisy df37366603 feat: Multi-Pair Divergence Selection Strategy

- Extend regime detection to top 10 cryptocurrencies (45 pairs)
- Dynamic pair selection based on divergence score (|z_score| * probability)
- Universal ML model trained on all pairs
- Correlation-based filtering to avoid redundant positions
- Funding rate integration from OKX for all 10 assets
- ATR-based dynamic stop-loss and take-profit
- Walk-forward training with 70/30 split

Performance: +35.69% return (vs +28.66% baseline), 63.6% win rate

2026-01-15 20:47:23 +08:00

12 KiB

Raw Blame History

PRD: Multi-Pair Divergence Selection Strategy

1. Introduction / Overview

This document describes the Multi-Pair Divergence Selection Strategy, an extension of the existing BTC/ETH regime reversion system. The strategy expands spread analysis to the top 10 cryptocurrencies by market cap, calculates divergence scores for all tradeable pairs, and dynamically selects the most divergent pair for trading.

The core hypothesis: by scanning multiple pairs simultaneously, we can identify stronger mean-reversion opportunities than focusing on a single pair, improving net PnL while maintaining the proven ML-based regime detection approach.

2. Goals

Extend regime detection to top 10 market cap cryptocurrencies
Dynamically select the most divergent tradeable pair each cycle
Integrate volatility into dynamic SL/TP calculations
Filter correlated pairs to avoid redundant positions
Improve net PnL compared to single-pair BTC/ETH strategy
Backtest-first implementation with walk-forward validation

3. User Stories

US-1: Multi-Pair Analysis

As a trader, I want the system to analyze spread divergence across multiple cryptocurrency pairs so that I can identify the best trading opportunity at any given moment.

US-2: Dynamic Pair Selection

As a trader, I want the system to automatically select and trade the pair with the highest divergence score (combination of Z-score magnitude and ML probability) so that I maximize mean-reversion profit potential.

US-3: Volatility-Adjusted Risk

As a trader, I want stop-loss and take-profit levels to adapt to each pair's volatility so that I avoid being stopped out prematurely on volatile assets while protecting profits on stable ones.

US-4: Correlation Filtering

As a trader, I want the system to avoid selecting pairs that are highly correlated with my current position so that I don't inadvertently double-down on the same market exposure.

US-5: Backtest Validation

As a researcher, I want to backtest this multi-pair strategy with walk-forward training so that I can validate improvement over the single-pair baseline without look-ahead bias.

4. Functional Requirements

4.1 Data Management

ID	Requirement
FR-1.1	System must support loading OHLCV data for top 10 market cap cryptocurrencies
FR-1.2	Target assets: BTC, ETH, SOL, XRP, BNB, DOGE, ADA, AVAX, LINK, DOT (configurable)
FR-1.3	System must identify all directly tradeable cross-pairs on OKX perpetuals
FR-1.4	System must align timestamps across all pairs for synchronized analysis
FR-1.5	System must handle missing data gracefully (skip pair if insufficient history)

4.2 Pair Generation

ID	Requirement
FR-2.1	Generate all unique pairs from asset universe: N*(N-1)/2 pairs (e.g., 45 pairs for 10 assets)
FR-2.2	Filter pairs to only those directly tradeable on OKX (no USDT intermediate)
FR-2.3	Fallback: If cross-pair not available, calculate synthetic spread via USDT pairs
FR-2.4	Store pair metadata: base asset, quote asset, exchange symbol, tradeable flag

4.3 Feature Engineering (Per Pair)

ID	Requirement
FR-3.1	Calculate spread ratio: `asset_a_close / asset_b_close`
FR-3.2	Calculate Z-Score with configurable rolling window (default: 24h)
FR-3.3	Calculate spread technicals: RSI(14), ROC(5), 1h change
FR-3.4	Calculate volume ratio and relative volume
FR-3.5	Calculate volatility ratio: `std(returns_a) / std(returns_b)` over Z-window
FR-3.6	Calculate realized volatility for each asset (for dynamic SL/TP)
FR-3.7	Merge on-chain data (funding rates, inflows) if available per asset
FR-3.8	Add pair identifier as categorical feature for universal model

4.4 Correlation Filtering

ID	Requirement
FR-4.1	Calculate rolling correlation matrix between all assets (default: 168h / 7 days)
FR-4.2	Define correlation threshold (default: 0.85)
FR-4.3	If current position exists, exclude pairs where either asset has correlation > threshold with held asset
FR-4.4	Log filtered pairs with reason for exclusion

4.5 Divergence Scoring & Pair Selection

ID	Requirement
FR-5.1	Calculate divergence score: `abs(z_score) * model_probability`
FR-5.2	Only consider pairs where `abs(z_score) > z_entry_threshold` (default: 1.0)
FR-5.3	Only consider pairs where `model_probability > prob_threshold` (default: 0.5)
FR-5.4	Apply correlation filter to eligible pairs
FR-5.5	Select pair with highest divergence score
FR-5.6	If no pair qualifies, signal "hold"
FR-5.7	Log all pair scores for analysis/debugging

4.6 ML Model (Universal)

ID	Requirement
FR-6.1	Train single Random Forest model on all pairs combined
FR-6.2	Include `pair_id` as one-hot encoded or label-encoded feature
FR-6.3	Target: binary (1 = profitable reversion within horizon, 0 = no reversion)
FR-6.4	Walk-forward training: 70% train / 30% test split
FR-6.5	Daily retraining schedule (for live, configurable for backtest)
FR-6.6	Model hyperparameters: `n_estimators=300, max_depth=5, min_samples_leaf=30, class_weight={0:1, 1:3}`
FR-6.7	Save/load model with feature column metadata

4.7 Signal Generation

ID	Requirement
FR-7.1	Direction: If `z_score > threshold` -> Short spread (short asset_a), If `z_score < -threshold` -> Long spread (long asset_a)
FR-7.2	Apply funding rate filter per asset (block if extreme funding opposes direction)
FR-7.3	Output signal: `{pair, action, side, probability, z_score, divergence_score, reason}`

4.8 Position Sizing

ID	Requirement
FR-8.1	Base size: 100% of available subaccount balance
FR-8.2	Scale by divergence: `size_multiplier = 1.0 + (divergence_score - base_threshold) * scaling_factor`
FR-8.3	Cap multiplier between 1.0x and 2.0x
FR-8.4	Respect exchange minimum order size per asset

4.9 Dynamic SL/TP (Volatility-Adjusted)

ID	Requirement
FR-9.1	Calculate asset realized volatility: `std(returns) * sqrt(24)` for daily vol
FR-9.2	Base SL: `entry_price * (1 - base_sl_pct * vol_multiplier)` for longs
FR-9.3	Base TP: `entry_price * (1 + base_tp_pct * vol_multiplier)` for longs
FR-9.4	`vol_multiplier = asset_volatility / baseline_volatility` (baseline = BTC volatility)
FR-9.5	Cap vol_multiplier between 0.5x and 2.0x to prevent extreme values
FR-9.6	Invert logic for short positions

4.10 Exit Conditions

ID	Requirement
FR-10.1	Exit when Z-score crosses back through 0 (mean reversion complete)
FR-10.2	Exit when dynamic SL or TP hit
FR-10.3	No minimum holding period (can switch pairs immediately)
FR-10.4	If new pair has higher divergence score, close current and open new

4.11 Backtest Integration

ID	Requirement
FR-11.1	Integrate with existing `engine/backtester.py` framework
FR-11.2	Support 1h timeframe (matching live trading)
FR-11.3	Walk-forward validation: train on 70%, test on 30%
FR-11.4	Output: trades log, equity curve, performance metrics
FR-11.5	Compare against single-pair BTC/ETH baseline

5. Non-Goals (Out of Scope)

Live trading implementation - Backtest validation first
Multi-position portfolio - Single pair at a time for v1
Cross-exchange arbitrage - OKX only
Alternative ML models - Stick with Random Forest for consistency
Sub-1h timeframes - 1h candles only for initial version
Leveraged positions - 1x leverage for backtest
Portfolio-level VaR/risk budgeting - Full subaccount allocation

6. Design Considerations

6.1 Architecture

strategies/
  multi_pair/
    __init__.py
    pair_scanner.py      # Generates all pairs, filters tradeable
    feature_engine.py    # Calculates features for all pairs
    correlation.py       # Rolling correlation matrix & filtering
    divergence_scorer.py # Ranks pairs by divergence score
    strategy.py          # Main strategy orchestration

6.2 Data Flow

1. Load OHLCV for all 10 assets
2. Generate pair combinations (45 pairs)
3. Filter to tradeable pairs (OKX check)
4. Calculate features for each pair
5. Train/load universal ML model
6. Predict probability for all pairs
7. Calculate divergence scores
8. Apply correlation filter
9. Select top pair
10. Generate signal with dynamic SL/TP
11. Execute in backtest engine

6.3 Configuration

@dataclass
class MultiPairConfig:
    # Assets
    assets: list[str] = field(default_factory=lambda: [
        "BTC", "ETH", "SOL", "XRP", "BNB", 
        "DOGE", "ADA", "AVAX", "LINK", "DOT"
    ])
    
    # Thresholds
    z_window: int = 24
    z_entry_threshold: float = 1.0
    prob_threshold: float = 0.5
    correlation_threshold: float = 0.85
    correlation_window: int = 168  # 7 days in hours
    
    # Risk
    base_sl_pct: float = 0.06
    base_tp_pct: float = 0.05
    vol_multiplier_min: float = 0.5
    vol_multiplier_max: float = 2.0
    
    # Model
    train_ratio: float = 0.7
    horizon: int = 102
    profit_target: float = 0.005

7. Technical Considerations

7.1 Dependencies

Extend DataManager to load multiple symbols
Query OKX API for available perpetual cross-pairs
Reuse existing feature engineering from RegimeReversionStrategy

7.2 Performance

Pre-calculate all pair features in batch (vectorized)
Cache correlation matrix (update every N candles, not every minute)
Model inference is fast (single predict call with all pairs as rows)

7.3 Edge Cases

Handle pairs with insufficient history (< 200 bars) - exclude
Handle assets delisted mid-backtest - skip pair
Handle zero-volume periods - use last valid price

8. Success Metrics

Metric	Baseline (BTC/ETH)	Target
Net PnL	Current performance	> 10% improvement
Number of Trades	N	Comparable or higher
Win Rate	Baseline %	Maintain or improve
Average Trade Duration	Baseline hours	Flexible
Max Drawdown	Baseline %	Not significantly worse

9. Open Questions

OKX Cross-Pairs: Need to verify which cross-pairs are available on OKX perpetuals. May need to fallback to synthetic spreads for most pairs.
On-Chain Data: CryptoQuant data currently covers BTC/ETH. Should we:
- Run without on-chain features for other assets?
- Source alternative on-chain data?
- Use funding rates only (available from OKX)?
Pair ID Encoding: For the universal model, should pair_id be:
- One-hot encoded (adds 45 features)?
- Label encoded (single ordinal feature)?
- Hierarchical (base_asset + quote_asset as separate features)?
Synthetic Spreads: If trading SOL/DOT spread but only USDT pairs available:
- Calculate spread synthetically: SOL-USDT / DOT-USDT
- Execute as two legs: Long SOL-USDT, Short DOT-USDT
- This doubles fees and adds execution complexity. Include in v1?

10. Implementation Phases

Phase 1: Data & Infrastructure (Est. 2-3 days)

Extend DataManager for multi-symbol loading
Build pair scanner with OKX tradeable filter
Implement correlation matrix calculation

Phase 2: Feature Engineering (Est. 2 days)

Adapt existing feature calculation for arbitrary pairs
Add pair identifier feature
Batch feature calculation for all pairs

Phase 3: Model & Scoring (Est. 2 days)

Train universal model on all pairs
Implement divergence scoring
Add correlation filtering to pair selection

Phase 4: Strategy Integration (Est. 2-3 days)

Implement dynamic SL/TP with volatility
Integrate with backtester
Build strategy orchestration class

Phase 5: Validation & Comparison (Est. 2 days)

Run walk-forward backtest
Compare against BTC/ETH baseline
Generate performance report

Total Estimated Effort: 10-12 days

Document Version: 1.0
Created: 2026-01-15
Author: AI Assistant
Status: Draft - Awaiting Review

12 KiB Raw Blame History