BTC–ETH Regime Modeling
This project builds and tests a Hidden Markov Model (HMM) that classifies structural market regimes in Bitcoin and Ethereum based on 1-minute OHLCV data from Bitstamp (https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data/ and https://www.kaggle.com/datasets/viniciusqroz/ethereum-historical-data).
It is designed to identify volatility and correlation phases—risk-on, risk-off, and neutral—and to evaluate how predictive or useful these regimes are across multiple timeframes and forecast horizons.
Project Overview
Goals
- Detect repeating market “regimes” using unsupervised learning.
- Evaluate how those regimes behave across timeframes and forecast horizons.
- Use regime identification to select trading strategies per market state, rather than predict short-term direction.
Datasets
Two synchronized 1-minute OHLCV datasets:
btcusd_1-min_data.csvethusd_1min_ohlc.csv
Both sourced from Bitstamp (Kaggle datasets).
Architecture
1. main.py
Core experiment runner. Implements:
-
Feature construction:
- Multi-scale realized volatility (
rv_*) - Trend ratios (
trend_*) - Rolling BTC–ETH correlations (
corr_*) - Cross-asset beta and divergence
- Liquidity proxies (
volratio,vol_sum,vol_diff)
- Multi-scale realized volatility (
-
Hidden Markov Model: Gaussian emissions, diagonal covariance.
-
Randomized time-split validation: multiple random train/test windows with configurable embargo gap to avoid leakage.
-
Metrics:
- Hit rate (directional accuracy)
- Annualized Sharpe ratio of the regime-implied signal
- Mean ± std across random splits
This script explores model robustness across different resample rules (e.g. 30min, 45min, 1H).
2. main_conf_metrics.py
Lightweight evaluator used for the confidence and coverage sweep.
-
Adds a
--confparameter to control how confident the model must be before emitting a trade (pseudo-ETSC gate). -
Prints per-run metrics:
cov: coverage (fraction of bars with predictions)hit: overall hit ratehit_trades: accuracy conditional on tradingSharpe: annualized risk-adjusted performance
Used by shell scripts to benchmark many timeframes and confidence thresholds.
3. Shell scripts
run_grid.sh
Runs a large grid of:
- multiple resample rules (e.g. 20 min – 60 min),
- multiple horizons (e.g. 2–6 bars ahead).
run_focus.sh
Focuses on the most promising regions (37–41 min, 49–59 min) and sweeps confidence thresholds (0.45 – 0.60). Produces concise summary lines for each combination.
Key Findings
-
Optimal timeframe: ~35 – 45 minutes consistently yields the highest Sharpe ratios (~2.2–2.3).
-
Forecast horizon: Best performance around two bars ahead (~80 min look-ahead for 40 min bars).
-
Confidence threshold: Little effect between 0.45–0.60; model already confident on > 90 % of bars.
-
Interpretation: Regimes reflect volatility and structure, not raw direction. Use them to switch strategy archetypes (trend vs. mean-reversion) rather than predict sign.
Example Usage
Single test
python main.py \
--btc ../data/btcusd_1-min_data.csv \
--eth ../data/ethusd_1min_ohlc.csv \
--rules "30min,45min,1H" \
--states 3 \
--horizon 60
Confidence and coverage sweep
./run_focus.sh
Typical Output
# Randomized time-split comparison
States=3 HorizonMin=60 Splits=8 TestBars=500 GapBars=24
rule splits hit sharpe
45min 8 0.4642 ± 0.0071 2.0575 ± 0.0413
39min 8 0.4662 ± 0.0083 2.3124 ± 0.0502
30min 8 0.4632 ± 0.0090 2.0331 ± 0.0368
Interpretation for Strategy Design
| Regime Type | Market Traits | Suggested Strategy |
|---|---|---|
| High-vol / decoupled | large ETH/BTC divergence | Momentum / Breakout |
| Low-vol / correlated | calm, mean-reverting | Reversion / Market-Making |
| Neutral | noisy transitions | Flat / Reduced exposure |
Requirements
- Python ≥ 3.11
- Environment manager: uv (fast Python package installer and environment manager)
Setup
Create and activate a local environment using uv:
# from the project root
uv venv
source .venv/bin/activate
# install dependencies
uv pip install numpy pandas scikit-learn hmmlearn
Repository Structure
.
├── main.py # core HMM regime experiment with CV
├── main_conf_metrics.py # confidence/coverage sweep
├── run_grid.sh # full grid search over horizons/timeframes
├── run_focus.sh # focused confidence sweep
├── README.md