lowkey_backtest/tasks/prd-multi-pair-divergence-strategy.md

# PRD: Multi-Pair Divergence Selection Strategy

## 1. Introduction / Overview

This document describes the **Multi-Pair Divergence Selection Strategy**, an extension of the existing BTC/ETH regime reversion system. The strategy expands spread analysis to the **top 10 cryptocurrencies by market cap**, calculates divergence scores for all tradeable pairs, and dynamically selects the **most divergent pair** for trading.

The core hypothesis: by scanning multiple pairs simultaneously, we can identify stronger mean-reversion opportunities than focusing on a single pair, improving net PnL while maintaining the proven ML-based regime detection approach.

---

## 2. Goals

1. **Extend regime detection** to top 10 market cap cryptocurrencies
2. **Dynamically select** the most divergent tradeable pair each cycle
3. **Integrate volatility** into dynamic SL/TP calculations
4. **Filter correlated pairs** to avoid redundant positions
5. **Improve net PnL** compared to single-pair BTC/ETH strategy
6. **Backtest-first** implementation with walk-forward validation

---

## 3. User Stories

### US-1: Multi-Pair Analysis
> As a trader, I want the system to analyze spread divergence across multiple cryptocurrency pairs so that I can identify the best trading opportunity at any given moment.

### US-2: Dynamic Pair Selection
> As a trader, I want the system to automatically select and trade the pair with the highest divergence score (combination of Z-score magnitude and ML probability) so that I maximize mean-reversion profit potential.

### US-3: Volatility-Adjusted Risk
> As a trader, I want stop-loss and take-profit levels to adapt to each pair's volatility so that I avoid being stopped out prematurely on volatile assets while protecting profits on stable ones.

### US-4: Correlation Filtering
> As a trader, I want the system to avoid selecting pairs that are highly correlated with my current position so that I don't inadvertently double-down on the same market exposure.

### US-5: Backtest Validation
> As a researcher, I want to backtest this multi-pair strategy with walk-forward training so that I can validate improvement over the single-pair baseline without look-ahead bias.

---

## 4. Functional Requirements

### 4.1 Data Management

| ID | Requirement |
|----|-------------|
| FR-1.1 | System must support loading OHLCV data for top 10 market cap cryptocurrencies |
| FR-1.2 | Target assets: BTC, ETH, SOL, XRP, BNB, DOGE, ADA, AVAX, LINK, DOT (configurable) |
| FR-1.3 | System must identify all directly tradeable cross-pairs on OKX perpetuals |
| FR-1.4 | System must align timestamps across all pairs for synchronized analysis |
| FR-1.5 | System must handle missing data gracefully (skip pair if insufficient history) |

### 4.2 Pair Generation

| ID | Requirement |
|----|-------------|
| FR-2.1 | Generate all unique pairs from asset universe: N*(N-1)/2 pairs (e.g., 45 pairs for 10 assets) |
| FR-2.2 | Filter pairs to only those directly tradeable on OKX (no USDT intermediate) |
| FR-2.3 | Fallback: If cross-pair not available, calculate synthetic spread via USDT pairs |
| FR-2.4 | Store pair metadata: base asset, quote asset, exchange symbol, tradeable flag |

### 4.3 Feature Engineering (Per Pair)

| ID | Requirement |
|----|-------------|
| FR-3.1 | Calculate spread ratio: `asset_a_close / asset_b_close` |
| FR-3.2 | Calculate Z-Score with configurable rolling window (default: 24h) |
| FR-3.3 | Calculate spread technicals: RSI(14), ROC(5), 1h change |
| FR-3.4 | Calculate volume ratio and relative volume |
| FR-3.5 | Calculate volatility ratio: `std(returns_a) / std(returns_b)` over Z-window |
| FR-3.6 | Calculate realized volatility for each asset (for dynamic SL/TP) |
| FR-3.7 | Merge on-chain data (funding rates, inflows) if available per asset |
| FR-3.8 | Add pair identifier as categorical feature for universal model |

### 4.4 Correlation Filtering

| ID | Requirement |
|----|-------------|
| FR-4.1 | Calculate rolling correlation matrix between all assets (default: 168h / 7 days) |
| FR-4.2 | Define correlation threshold (default: 0.85) |
| FR-4.3 | If current position exists, exclude pairs where either asset has correlation > threshold with held asset |
| FR-4.4 | Log filtered pairs with reason for exclusion |

### 4.5 Divergence Scoring & Pair Selection

| ID | Requirement |
|----|-------------|
| FR-5.1 | Calculate divergence score: `abs(z_score) * model_probability` |
| FR-5.2 | Only consider pairs where `abs(z_score) > z_entry_threshold` (default: 1.0) |
| FR-5.3 | Only consider pairs where `model_probability > prob_threshold` (default: 0.5) |
| FR-5.4 | Apply correlation filter to eligible pairs |
| FR-5.5 | Select pair with highest divergence score |
| FR-5.6 | If no pair qualifies, signal "hold" |
| FR-5.7 | Log all pair scores for analysis/debugging |

### 4.6 ML Model (Universal)

| ID | Requirement |
|----|-------------|
| FR-6.1 | Train single Random Forest model on all pairs combined |
| FR-6.2 | Include `pair_id` as one-hot encoded or label-encoded feature |
| FR-6.3 | Target: binary (1 = profitable reversion within horizon, 0 = no reversion) |
| FR-6.4 | Walk-forward training: 70% train / 30% test split |
| FR-6.5 | Daily retraining schedule (for live, configurable for backtest) |
| FR-6.6 | Model hyperparameters: `n_estimators=300, max_depth=5, min_samples_leaf=30, class_weight={0:1, 1:3}` |
| FR-6.7 | Save/load model with feature column metadata |

### 4.7 Signal Generation

| ID | Requirement |
|----|-------------|
| FR-7.1 | Direction: If `z_score > threshold` -> Short spread (short asset_a), If `z_score < -threshold` -> Long spread (long asset_a) |
| FR-7.2 | Apply funding rate filter per asset (block if extreme funding opposes direction) |
| FR-7.3 | Output signal: `{pair, action, side, probability, z_score, divergence_score, reason}` |

### 4.8 Position Sizing

| ID | Requirement |
|----|-------------|
| FR-8.1 | Base size: 100% of available subaccount balance |
| FR-8.2 | Scale by divergence: `size_multiplier = 1.0 + (divergence_score - base_threshold) * scaling_factor` |
| FR-8.3 | Cap multiplier between 1.0x and 2.0x |
| FR-8.4 | Respect exchange minimum order size per asset |

### 4.9 Dynamic SL/TP (Volatility-Adjusted)

| ID | Requirement |
|----|-------------|
| FR-9.1 | Calculate asset realized volatility: `std(returns) * sqrt(24)` for daily vol |
| FR-9.2 | Base SL: `entry_price * (1 - base_sl_pct * vol_multiplier)` for longs |
| FR-9.3 | Base TP: `entry_price * (1 + base_tp_pct * vol_multiplier)` for longs |
| FR-9.4 | `vol_multiplier = asset_volatility / baseline_volatility` (baseline = BTC volatility) |
| FR-9.5 | Cap vol_multiplier between 0.5x and 2.0x to prevent extreme values |
| FR-9.6 | Invert logic for short positions |

### 4.10 Exit Conditions

| ID | Requirement |
|----|-------------|
| FR-10.1 | Exit when Z-score crosses back through 0 (mean reversion complete) |
| FR-10.2 | Exit when dynamic SL or TP hit |
| FR-10.3 | No minimum holding period (can switch pairs immediately) |
| FR-10.4 | If new pair has higher divergence score, close current and open new |

### 4.11 Backtest Integration

| ID | Requirement |
|----|-------------|
| FR-11.1 | Integrate with existing `engine/backtester.py` framework |
| FR-11.2 | Support 1h timeframe (matching live trading) |
| FR-11.3 | Walk-forward validation: train on 70%, test on 30% |
| FR-11.4 | Output: trades log, equity curve, performance metrics |
| FR-11.5 | Compare against single-pair BTC/ETH baseline |

---

## 5. Non-Goals (Out of Scope)

1. **Live trading implementation** - Backtest validation first
2. **Multi-position portfolio** - Single pair at a time for v1
3. **Cross-exchange arbitrage** - OKX only
4. **Alternative ML models** - Stick with Random Forest for consistency
5. **Sub-1h timeframes** - 1h candles only for initial version
6. **Leveraged positions** - 1x leverage for backtest
7. **Portfolio-level VaR/risk budgeting** - Full subaccount allocation

---

## 6. Design Considerations

### 6.1 Architecture

```
strategies/
  multi_pair/
    __init__.py
    pair_scanner.py      # Generates all pairs, filters tradeable
    feature_engine.py    # Calculates features for all pairs
    correlation.py       # Rolling correlation matrix & filtering
    divergence_scorer.py # Ranks pairs by divergence score
    strategy.py          # Main strategy orchestration
```

### 6.2 Data Flow

```
1. Load OHLCV for all 10 assets
2. Generate pair combinations (45 pairs)
3. Filter to tradeable pairs (OKX check)
4. Calculate features for each pair
5. Train/load universal ML model
6. Predict probability for all pairs
7. Calculate divergence scores
8. Apply correlation filter
9. Select top pair
10. Generate signal with dynamic SL/TP
11. Execute in backtest engine
```

### 6.3 Configuration

```python
@dataclass
class MultiPairConfig:
    # Assets
    assets: list[str] = field(default_factory=lambda: [
        "BTC", "ETH", "SOL", "XRP", "BNB",
        "DOGE", "ADA", "AVAX", "LINK", "DOT"
    ])

    # Thresholds
    z_window: int = 24
    z_entry_threshold: float = 1.0
    prob_threshold: float = 0.5
    correlation_threshold: float = 0.85
    correlation_window: int = 168  # 7 days in hours

    # Risk
    base_sl_pct: float = 0.06
    base_tp_pct: float = 0.05
    vol_multiplier_min: float = 0.5
    vol_multiplier_max: float = 2.0

    # Model
    train_ratio: float = 0.7
    horizon: int = 102
    profit_target: float = 0.005
```

---

## 7. Technical Considerations

### 7.1 Dependencies

- Extend `DataManager` to load multiple symbols
- Query OKX API for available perpetual cross-pairs
- Reuse existing feature engineering from `RegimeReversionStrategy`

### 7.2 Performance

- Pre-calculate all pair features in batch (vectorized)
- Cache correlation matrix (update every N candles, not every minute)
- Model inference is fast (single predict call with all pairs as rows)

### 7.3 Edge Cases

- Handle pairs with insufficient history (< 200 bars) - exclude
- Handle assets delisted mid-backtest - skip pair
- Handle zero-volume periods - use last valid price

---

## 8. Success Metrics

| Metric | Baseline (BTC/ETH) | Target |
|--------|-------------------|--------|
| Net PnL | Current performance | > 10% improvement |
| Number of Trades | N | Comparable or higher |
| Win Rate | Baseline % | Maintain or improve |
| Average Trade Duration | Baseline hours | Flexible |
| Max Drawdown | Baseline % | Not significantly worse |

---

## 9. Open Questions

1. **OKX Cross-Pairs**: Need to verify which cross-pairs are available on OKX perpetuals. May need to fallback to synthetic spreads for most pairs.

2. **On-Chain Data**: CryptoQuant data currently covers BTC/ETH. Should we:
   - Run without on-chain features for other assets?
   - Source alternative on-chain data?
   - Use funding rates only (available from OKX)?

3. **Pair ID Encoding**: For the universal model, should pair_id be:
   - One-hot encoded (adds 45 features)?
   - Label encoded (single ordinal feature)?
   - Hierarchical (base_asset + quote_asset as separate features)?

4. **Synthetic Spreads**: If trading SOL/DOT spread but only USDT pairs available:
   - Calculate spread synthetically: `SOL-USDT / DOT-USDT`
   - Execute as two legs: Long SOL-USDT, Short DOT-USDT
   - This doubles fees and adds execution complexity. Include in v1?

---

## 10. Implementation Phases

### Phase 1: Data & Infrastructure (Est. 2-3 days)
- Extend DataManager for multi-symbol loading
- Build pair scanner with OKX tradeable filter
- Implement correlation matrix calculation

### Phase 2: Feature Engineering (Est. 2 days)
- Adapt existing feature calculation for arbitrary pairs
- Add pair identifier feature
- Batch feature calculation for all pairs

### Phase 3: Model & Scoring (Est. 2 days)
- Train universal model on all pairs
- Implement divergence scoring
- Add correlation filtering to pair selection

### Phase 4: Strategy Integration (Est. 2-3 days)
- Implement dynamic SL/TP with volatility
- Integrate with backtester
- Build strategy orchestration class

### Phase 5: Validation & Comparison (Est. 2 days)
- Run walk-forward backtest
- Compare against BTC/ETH baseline
- Generate performance report

**Total Estimated Effort: 10-12 days**

---

*Document Version: 1.0*
*Created: 2026-01-15*
*Author: AI Assistant*
*Status: Draft - Awaiting Review*