BTC_ETH_regime_predictor/README.md

# BTC–ETH Regime Modeling

This project builds and tests a **Hidden Markov Model (HMM)** that classifies structural market regimes in Bitcoin and Ethereum based on 1-minute OHLCV data from Bitstamp (https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data/ and https://www.kaggle.com/datasets/viniciusqroz/ethereum-historical-data).

It is designed to identify volatility and correlation phases—*risk-on*, *risk-off*, and *neutral*—and to evaluate how predictive or useful these regimes are across multiple timeframes and forecast horizons.

---

## Project Overview

### Goals

1. Detect repeating market “regimes” using unsupervised learning.
2. Evaluate how those regimes behave across timeframes and forecast horizons.
3. Use regime identification to **select trading strategies per market state**, rather than predict short-term direction.

### Datasets

Two synchronized 1-minute OHLCV datasets:

* `btcusd_1-min_data.csv`
* `ethusd_1min_ohlc.csv`

Both sourced from Bitstamp (Kaggle datasets).

---

## Architecture

### 1. `main.py`

Core experiment runner.
Implements:

* **Feature construction**:

  * Multi-scale realized volatility (`rv_*`)
  * Trend ratios (`trend_*`)
  * Rolling BTC–ETH correlations (`corr_*`)
  * Cross-asset beta and divergence
  * Liquidity proxies (`volratio`, `vol_sum`, `vol_diff`)
* **Hidden Markov Model**: Gaussian emissions, diagonal covariance.
* **Randomized time-split validation**: multiple random train/test windows with configurable embargo gap to avoid leakage.
* **Metrics**:

  * Hit rate (directional accuracy)
  * Annualized Sharpe ratio of the regime-implied signal
  * Mean ± std across random splits

This script explores model robustness across **different resample rules** (e.g. `30min`, `45min`, `1H`).

---

### 2. `main_conf_metrics.py`

Lightweight evaluator used for the **confidence and coverage sweep**.

* Adds a `--conf` parameter to control how confident the model must be before emitting a trade (pseudo-ETSC gate).
* Prints per-run metrics:

  * `cov`: coverage (fraction of bars with predictions)
  * `hit`: overall hit rate
  * `hit_trades`: accuracy conditional on trading
  * `Sharpe`: annualized risk-adjusted performance

Used by shell scripts to benchmark many timeframes and confidence thresholds.

---

### 3. Shell scripts

#### `run_grid.sh`

Runs a large grid of:

* multiple resample rules (e.g. 20 min – 60 min),
* multiple horizons (e.g. 2–6 bars ahead).

#### `run_focus.sh`

Focuses on the most promising regions (37–41 min, 49–59 min)
and sweeps confidence thresholds (0.45 – 0.60).
Produces concise summary lines for each combination.

---

## Key Findings

1. **Optimal timeframe:**
   ~**35 – 45 minutes** consistently yields the highest Sharpe ratios (~2.2–2.3).

2. **Forecast horizon:**
   Best performance around **two bars ahead** (~80 min look-ahead for 40 min bars).

3. **Confidence threshold:**
   Little effect between 0.45–0.60; model already confident on > 90 % of bars.

4. **Interpretation:**
   Regimes reflect volatility and structure, not raw direction.
   Use them to switch *strategy archetypes* (trend vs. mean-reversion) rather than predict sign.

---

## Example Usage

### Single test

```bash
python main.py \
  --btc ../data/btcusd_1-min_data.csv \
  --eth ../data/ethusd_1min_ohlc.csv \
  --rules "30min,45min,1H" \
  --states 3 \
  --horizon 60
```

### Confidence and coverage sweep

```bash
./run_focus.sh
```

---

## Typical Output

```
# Randomized time-split comparison
States=3  HorizonMin=60  Splits=8  TestBars=500  GapBars=24
 rule  splits         hit             sharpe
 45min       8 0.4642 ± 0.0071 2.0575 ± 0.0413
 39min       8 0.4662 ± 0.0083 2.3124 ± 0.0502
 30min       8 0.4632 ± 0.0090 2.0331 ± 0.0368
```

---

## Interpretation for Strategy Design

| Regime Type          | Market Traits            | Suggested Strategy        |
| -------------------- | ------------------------ | ------------------------- |
| High-vol / decoupled | large ETH/BTC divergence | Momentum / Breakout       |
| Low-vol / correlated | calm, mean-reverting     | Reversion / Market-Making |
| Neutral              | noisy transitions        | Flat / Reduced exposure   |

---

## Requirements

* **Python ≥ 3.11**
* **Environment manager:** [**uv**](https://github.com/astral-sh/uv) (fast Python package installer and environment manager)

### Setup

Create and activate a local environment using **uv**:

```bash
# from the project root
uv venv
source .venv/bin/activate

# install dependencies
uv pip install numpy pandas scikit-learn hmmlearn
```

---

## Repository Structure

```
.
├── main.py                 # core HMM regime experiment with CV
├── main_conf_metrics.py    # confidence/coverage sweep
├── run_grid.sh             # full grid search over horizons/timeframes
├── run_focus.sh            # focused confidence sweep
├── README.md
```