BTCETH Regime Modeling

This project builds and tests a Hidden Markov Model (HMM) that classifies structural market regimes in Bitcoin and Ethereum based on 1-minute OHLCV data from Bitstamp (https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data/ and https://www.kaggle.com/datasets/viniciusqroz/ethereum-historical-data).

It is designed to identify volatility and correlation phases—risk-on, risk-off, and neutral—and to evaluate how predictive or useful these regimes are across multiple timeframes and forecast horizons.


Project Overview

Goals

  1. Detect repeating market “regimes” using unsupervised learning.
  2. Evaluate how those regimes behave across timeframes and forecast horizons.
  3. Use regime identification to select trading strategies per market state, rather than predict short-term direction.

Datasets

Two synchronized 1-minute OHLCV datasets:

  • btcusd_1-min_data.csv
  • ethusd_1min_ohlc.csv

Both sourced from Bitstamp (Kaggle datasets).


Architecture

1. main.py

Core experiment runner. Implements:

  • Feature construction:

    • Multi-scale realized volatility (rv_*)
    • Trend ratios (trend_*)
    • Rolling BTCETH correlations (corr_*)
    • Cross-asset beta and divergence
    • Liquidity proxies (volratio, vol_sum, vol_diff)
  • Hidden Markov Model: Gaussian emissions, diagonal covariance.

  • Randomized time-split validation: multiple random train/test windows with configurable embargo gap to avoid leakage.

  • Metrics:

    • Hit rate (directional accuracy)
    • Annualized Sharpe ratio of the regime-implied signal
    • Mean ± std across random splits

This script explores model robustness across different resample rules (e.g. 30min, 45min, 1H).


2. main_conf_metrics.py

Lightweight evaluator used for the confidence and coverage sweep.

  • Adds a --conf parameter to control how confident the model must be before emitting a trade (pseudo-ETSC gate).

  • Prints per-run metrics:

    • cov: coverage (fraction of bars with predictions)
    • hit: overall hit rate
    • hit_trades: accuracy conditional on trading
    • Sharpe: annualized risk-adjusted performance

Used by shell scripts to benchmark many timeframes and confidence thresholds.


3. Shell scripts

run_grid.sh

Runs a large grid of:

  • multiple resample rules (e.g. 20 min 60 min),
  • multiple horizons (e.g. 26 bars ahead).

run_focus.sh

Focuses on the most promising regions (3741 min, 4959 min) and sweeps confidence thresholds (0.45 0.60). Produces concise summary lines for each combination.


Key Findings

  1. Optimal timeframe: ~35 45 minutes consistently yields the highest Sharpe ratios (~2.22.3).

  2. Forecast horizon: Best performance around two bars ahead (~80 min look-ahead for 40 min bars).

  3. Confidence threshold: Little effect between 0.450.60; model already confident on > 90 % of bars.

  4. Interpretation: Regimes reflect volatility and structure, not raw direction. Use them to switch strategy archetypes (trend vs. mean-reversion) rather than predict sign.


Example Usage

Single test

python main.py \
  --btc ../data/btcusd_1-min_data.csv \
  --eth ../data/ethusd_1min_ohlc.csv \
  --rules "30min,45min,1H" \
  --states 3 \
  --horizon 60

Confidence and coverage sweep

./run_focus.sh

Typical Output

# Randomized time-split comparison
States=3  HorizonMin=60  Splits=8  TestBars=500  GapBars=24
 rule  splits         hit             sharpe
 45min       8 0.4642 ± 0.0071 2.0575 ± 0.0413
 39min       8 0.4662 ± 0.0083 2.3124 ± 0.0502
 30min       8 0.4632 ± 0.0090 2.0331 ± 0.0368

Interpretation for Strategy Design

Regime Type Market Traits Suggested Strategy
High-vol / decoupled large ETH/BTC divergence Momentum / Breakout
Low-vol / correlated calm, mean-reverting Reversion / Market-Making
Neutral noisy transitions Flat / Reduced exposure

Requirements

  • Python ≥ 3.11
  • Environment manager: uv (fast Python package installer and environment manager)

Setup

Create and activate a local environment using uv:

# from the project root
uv venv
source .venv/bin/activate

# install dependencies
uv pip install numpy pandas scikit-learn hmmlearn

Repository Structure

.
├── main.py                 # core HMM regime experiment with CV
├── main_conf_metrics.py    # confidence/coverage sweep
├── run_grid.sh             # full grid search over horizons/timeframes
├── run_focus.sh            # focused confidence sweep
├── README.md
Description
No description provided
Readme 99 KiB
Languages
Python 88.9%
Shell 11.1%