- Extend regime detection to top 10 cryptocurrencies (45 pairs) - Dynamic pair selection based on divergence score (|z_score| * probability) - Universal ML model trained on all pairs - Correlation-based filtering to avoid redundant positions - Funding rate integration from OKX for all 10 assets - ATR-based dynamic stop-loss and take-profit - Walk-forward training with 70/30 split Performance: +35.69% return (vs +28.66% baseline), 63.6% win rate
322 lines
12 KiB
Markdown
322 lines
12 KiB
Markdown
# PRD: Multi-Pair Divergence Selection Strategy
|
|
|
|
## 1. Introduction / Overview
|
|
|
|
This document describes the **Multi-Pair Divergence Selection Strategy**, an extension of the existing BTC/ETH regime reversion system. The strategy expands spread analysis to the **top 10 cryptocurrencies by market cap**, calculates divergence scores for all tradeable pairs, and dynamically selects the **most divergent pair** for trading.
|
|
|
|
The core hypothesis: by scanning multiple pairs simultaneously, we can identify stronger mean-reversion opportunities than focusing on a single pair, improving net PnL while maintaining the proven ML-based regime detection approach.
|
|
|
|
---
|
|
|
|
## 2. Goals
|
|
|
|
1. **Extend regime detection** to top 10 market cap cryptocurrencies
|
|
2. **Dynamically select** the most divergent tradeable pair each cycle
|
|
3. **Integrate volatility** into dynamic SL/TP calculations
|
|
4. **Filter correlated pairs** to avoid redundant positions
|
|
5. **Improve net PnL** compared to single-pair BTC/ETH strategy
|
|
6. **Backtest-first** implementation with walk-forward validation
|
|
|
|
---
|
|
|
|
## 3. User Stories
|
|
|
|
### US-1: Multi-Pair Analysis
|
|
> As a trader, I want the system to analyze spread divergence across multiple cryptocurrency pairs so that I can identify the best trading opportunity at any given moment.
|
|
|
|
### US-2: Dynamic Pair Selection
|
|
> As a trader, I want the system to automatically select and trade the pair with the highest divergence score (combination of Z-score magnitude and ML probability) so that I maximize mean-reversion profit potential.
|
|
|
|
### US-3: Volatility-Adjusted Risk
|
|
> As a trader, I want stop-loss and take-profit levels to adapt to each pair's volatility so that I avoid being stopped out prematurely on volatile assets while protecting profits on stable ones.
|
|
|
|
### US-4: Correlation Filtering
|
|
> As a trader, I want the system to avoid selecting pairs that are highly correlated with my current position so that I don't inadvertently double-down on the same market exposure.
|
|
|
|
### US-5: Backtest Validation
|
|
> As a researcher, I want to backtest this multi-pair strategy with walk-forward training so that I can validate improvement over the single-pair baseline without look-ahead bias.
|
|
|
|
---
|
|
|
|
## 4. Functional Requirements
|
|
|
|
### 4.1 Data Management
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-1.1 | System must support loading OHLCV data for top 10 market cap cryptocurrencies |
|
|
| FR-1.2 | Target assets: BTC, ETH, SOL, XRP, BNB, DOGE, ADA, AVAX, LINK, DOT (configurable) |
|
|
| FR-1.3 | System must identify all directly tradeable cross-pairs on OKX perpetuals |
|
|
| FR-1.4 | System must align timestamps across all pairs for synchronized analysis |
|
|
| FR-1.5 | System must handle missing data gracefully (skip pair if insufficient history) |
|
|
|
|
### 4.2 Pair Generation
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-2.1 | Generate all unique pairs from asset universe: N*(N-1)/2 pairs (e.g., 45 pairs for 10 assets) |
|
|
| FR-2.2 | Filter pairs to only those directly tradeable on OKX (no USDT intermediate) |
|
|
| FR-2.3 | Fallback: If cross-pair not available, calculate synthetic spread via USDT pairs |
|
|
| FR-2.4 | Store pair metadata: base asset, quote asset, exchange symbol, tradeable flag |
|
|
|
|
### 4.3 Feature Engineering (Per Pair)
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-3.1 | Calculate spread ratio: `asset_a_close / asset_b_close` |
|
|
| FR-3.2 | Calculate Z-Score with configurable rolling window (default: 24h) |
|
|
| FR-3.3 | Calculate spread technicals: RSI(14), ROC(5), 1h change |
|
|
| FR-3.4 | Calculate volume ratio and relative volume |
|
|
| FR-3.5 | Calculate volatility ratio: `std(returns_a) / std(returns_b)` over Z-window |
|
|
| FR-3.6 | Calculate realized volatility for each asset (for dynamic SL/TP) |
|
|
| FR-3.7 | Merge on-chain data (funding rates, inflows) if available per asset |
|
|
| FR-3.8 | Add pair identifier as categorical feature for universal model |
|
|
|
|
### 4.4 Correlation Filtering
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-4.1 | Calculate rolling correlation matrix between all assets (default: 168h / 7 days) |
|
|
| FR-4.2 | Define correlation threshold (default: 0.85) |
|
|
| FR-4.3 | If current position exists, exclude pairs where either asset has correlation > threshold with held asset |
|
|
| FR-4.4 | Log filtered pairs with reason for exclusion |
|
|
|
|
### 4.5 Divergence Scoring & Pair Selection
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-5.1 | Calculate divergence score: `abs(z_score) * model_probability` |
|
|
| FR-5.2 | Only consider pairs where `abs(z_score) > z_entry_threshold` (default: 1.0) |
|
|
| FR-5.3 | Only consider pairs where `model_probability > prob_threshold` (default: 0.5) |
|
|
| FR-5.4 | Apply correlation filter to eligible pairs |
|
|
| FR-5.5 | Select pair with highest divergence score |
|
|
| FR-5.6 | If no pair qualifies, signal "hold" |
|
|
| FR-5.7 | Log all pair scores for analysis/debugging |
|
|
|
|
### 4.6 ML Model (Universal)
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-6.1 | Train single Random Forest model on all pairs combined |
|
|
| FR-6.2 | Include `pair_id` as one-hot encoded or label-encoded feature |
|
|
| FR-6.3 | Target: binary (1 = profitable reversion within horizon, 0 = no reversion) |
|
|
| FR-6.4 | Walk-forward training: 70% train / 30% test split |
|
|
| FR-6.5 | Daily retraining schedule (for live, configurable for backtest) |
|
|
| FR-6.6 | Model hyperparameters: `n_estimators=300, max_depth=5, min_samples_leaf=30, class_weight={0:1, 1:3}` |
|
|
| FR-6.7 | Save/load model with feature column metadata |
|
|
|
|
### 4.7 Signal Generation
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-7.1 | Direction: If `z_score > threshold` -> Short spread (short asset_a), If `z_score < -threshold` -> Long spread (long asset_a) |
|
|
| FR-7.2 | Apply funding rate filter per asset (block if extreme funding opposes direction) |
|
|
| FR-7.3 | Output signal: `{pair, action, side, probability, z_score, divergence_score, reason}` |
|
|
|
|
### 4.8 Position Sizing
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-8.1 | Base size: 100% of available subaccount balance |
|
|
| FR-8.2 | Scale by divergence: `size_multiplier = 1.0 + (divergence_score - base_threshold) * scaling_factor` |
|
|
| FR-8.3 | Cap multiplier between 1.0x and 2.0x |
|
|
| FR-8.4 | Respect exchange minimum order size per asset |
|
|
|
|
### 4.9 Dynamic SL/TP (Volatility-Adjusted)
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-9.1 | Calculate asset realized volatility: `std(returns) * sqrt(24)` for daily vol |
|
|
| FR-9.2 | Base SL: `entry_price * (1 - base_sl_pct * vol_multiplier)` for longs |
|
|
| FR-9.3 | Base TP: `entry_price * (1 + base_tp_pct * vol_multiplier)` for longs |
|
|
| FR-9.4 | `vol_multiplier = asset_volatility / baseline_volatility` (baseline = BTC volatility) |
|
|
| FR-9.5 | Cap vol_multiplier between 0.5x and 2.0x to prevent extreme values |
|
|
| FR-9.6 | Invert logic for short positions |
|
|
|
|
### 4.10 Exit Conditions
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-10.1 | Exit when Z-score crosses back through 0 (mean reversion complete) |
|
|
| FR-10.2 | Exit when dynamic SL or TP hit |
|
|
| FR-10.3 | No minimum holding period (can switch pairs immediately) |
|
|
| FR-10.4 | If new pair has higher divergence score, close current and open new |
|
|
|
|
### 4.11 Backtest Integration
|
|
|
|
| ID | Requirement |
|
|
|----|-------------|
|
|
| FR-11.1 | Integrate with existing `engine/backtester.py` framework |
|
|
| FR-11.2 | Support 1h timeframe (matching live trading) |
|
|
| FR-11.3 | Walk-forward validation: train on 70%, test on 30% |
|
|
| FR-11.4 | Output: trades log, equity curve, performance metrics |
|
|
| FR-11.5 | Compare against single-pair BTC/ETH baseline |
|
|
|
|
---
|
|
|
|
## 5. Non-Goals (Out of Scope)
|
|
|
|
1. **Live trading implementation** - Backtest validation first
|
|
2. **Multi-position portfolio** - Single pair at a time for v1
|
|
3. **Cross-exchange arbitrage** - OKX only
|
|
4. **Alternative ML models** - Stick with Random Forest for consistency
|
|
5. **Sub-1h timeframes** - 1h candles only for initial version
|
|
6. **Leveraged positions** - 1x leverage for backtest
|
|
7. **Portfolio-level VaR/risk budgeting** - Full subaccount allocation
|
|
|
|
---
|
|
|
|
## 6. Design Considerations
|
|
|
|
### 6.1 Architecture
|
|
|
|
```
|
|
strategies/
|
|
multi_pair/
|
|
__init__.py
|
|
pair_scanner.py # Generates all pairs, filters tradeable
|
|
feature_engine.py # Calculates features for all pairs
|
|
correlation.py # Rolling correlation matrix & filtering
|
|
divergence_scorer.py # Ranks pairs by divergence score
|
|
strategy.py # Main strategy orchestration
|
|
```
|
|
|
|
### 6.2 Data Flow
|
|
|
|
```
|
|
1. Load OHLCV for all 10 assets
|
|
2. Generate pair combinations (45 pairs)
|
|
3. Filter to tradeable pairs (OKX check)
|
|
4. Calculate features for each pair
|
|
5. Train/load universal ML model
|
|
6. Predict probability for all pairs
|
|
7. Calculate divergence scores
|
|
8. Apply correlation filter
|
|
9. Select top pair
|
|
10. Generate signal with dynamic SL/TP
|
|
11. Execute in backtest engine
|
|
```
|
|
|
|
### 6.3 Configuration
|
|
|
|
```python
|
|
@dataclass
|
|
class MultiPairConfig:
|
|
# Assets
|
|
assets: list[str] = field(default_factory=lambda: [
|
|
"BTC", "ETH", "SOL", "XRP", "BNB",
|
|
"DOGE", "ADA", "AVAX", "LINK", "DOT"
|
|
])
|
|
|
|
# Thresholds
|
|
z_window: int = 24
|
|
z_entry_threshold: float = 1.0
|
|
prob_threshold: float = 0.5
|
|
correlation_threshold: float = 0.85
|
|
correlation_window: int = 168 # 7 days in hours
|
|
|
|
# Risk
|
|
base_sl_pct: float = 0.06
|
|
base_tp_pct: float = 0.05
|
|
vol_multiplier_min: float = 0.5
|
|
vol_multiplier_max: float = 2.0
|
|
|
|
# Model
|
|
train_ratio: float = 0.7
|
|
horizon: int = 102
|
|
profit_target: float = 0.005
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Technical Considerations
|
|
|
|
### 7.1 Dependencies
|
|
|
|
- Extend `DataManager` to load multiple symbols
|
|
- Query OKX API for available perpetual cross-pairs
|
|
- Reuse existing feature engineering from `RegimeReversionStrategy`
|
|
|
|
### 7.2 Performance
|
|
|
|
- Pre-calculate all pair features in batch (vectorized)
|
|
- Cache correlation matrix (update every N candles, not every minute)
|
|
- Model inference is fast (single predict call with all pairs as rows)
|
|
|
|
### 7.3 Edge Cases
|
|
|
|
- Handle pairs with insufficient history (< 200 bars) - exclude
|
|
- Handle assets delisted mid-backtest - skip pair
|
|
- Handle zero-volume periods - use last valid price
|
|
|
|
---
|
|
|
|
## 8. Success Metrics
|
|
|
|
| Metric | Baseline (BTC/ETH) | Target |
|
|
|--------|-------------------|--------|
|
|
| Net PnL | Current performance | > 10% improvement |
|
|
| Number of Trades | N | Comparable or higher |
|
|
| Win Rate | Baseline % | Maintain or improve |
|
|
| Average Trade Duration | Baseline hours | Flexible |
|
|
| Max Drawdown | Baseline % | Not significantly worse |
|
|
|
|
---
|
|
|
|
## 9. Open Questions
|
|
|
|
1. **OKX Cross-Pairs**: Need to verify which cross-pairs are available on OKX perpetuals. May need to fallback to synthetic spreads for most pairs.
|
|
|
|
2. **On-Chain Data**: CryptoQuant data currently covers BTC/ETH. Should we:
|
|
- Run without on-chain features for other assets?
|
|
- Source alternative on-chain data?
|
|
- Use funding rates only (available from OKX)?
|
|
|
|
3. **Pair ID Encoding**: For the universal model, should pair_id be:
|
|
- One-hot encoded (adds 45 features)?
|
|
- Label encoded (single ordinal feature)?
|
|
- Hierarchical (base_asset + quote_asset as separate features)?
|
|
|
|
4. **Synthetic Spreads**: If trading SOL/DOT spread but only USDT pairs available:
|
|
- Calculate spread synthetically: `SOL-USDT / DOT-USDT`
|
|
- Execute as two legs: Long SOL-USDT, Short DOT-USDT
|
|
- This doubles fees and adds execution complexity. Include in v1?
|
|
|
|
---
|
|
|
|
## 10. Implementation Phases
|
|
|
|
### Phase 1: Data & Infrastructure (Est. 2-3 days)
|
|
- Extend DataManager for multi-symbol loading
|
|
- Build pair scanner with OKX tradeable filter
|
|
- Implement correlation matrix calculation
|
|
|
|
### Phase 2: Feature Engineering (Est. 2 days)
|
|
- Adapt existing feature calculation for arbitrary pairs
|
|
- Add pair identifier feature
|
|
- Batch feature calculation for all pairs
|
|
|
|
### Phase 3: Model & Scoring (Est. 2 days)
|
|
- Train universal model on all pairs
|
|
- Implement divergence scoring
|
|
- Add correlation filtering to pair selection
|
|
|
|
### Phase 4: Strategy Integration (Est. 2-3 days)
|
|
- Implement dynamic SL/TP with volatility
|
|
- Integrate with backtester
|
|
- Build strategy orchestration class
|
|
|
|
### Phase 5: Validation & Comparison (Est. 2 days)
|
|
- Run walk-forward backtest
|
|
- Compare against BTC/ETH baseline
|
|
- Generate performance report
|
|
|
|
**Total Estimated Effort: 10-12 days**
|
|
|
|
---
|
|
|
|
*Document Version: 1.0*
|
|
*Created: 2026-01-15*
|
|
*Author: AI Assistant*
|
|
*Status: Draft - Awaiting Review*
|