It is designed to identify volatility and correlation phases—risk-on, risk-off, and neutral—and to evaluate how predictive or useful these regimes are across multiple timeframes and forecast horizons.

Project Overview

Goals

Detect repeating market “regimes” using unsupervised learning.
Evaluate how those regimes behave across timeframes and forecast horizons.
Use regime identification to select trading strategies per market state, rather than predict short-term direction.

Datasets

Two synchronized 1-minute OHLCV datasets:

btcusd_1-min_data.csv
ethusd_1min_ohlc.csv

Both sourced from Bitstamp (Kaggle datasets).

Architecture

1. `main.py`

Core experiment runner. Implements:

Feature construction:
- Multi-scale realized volatility (rv_*)
- Trend ratios (trend_*)
- Rolling BTC–ETH correlations (corr_*)
- Cross-asset beta and divergence
- Liquidity proxies (volratio, vol_sum, vol_diff)
Hidden Markov Model: Gaussian emissions, diagonal covariance.
Randomized time-split validation: multiple random train/test windows with configurable embargo gap to avoid leakage.
Metrics:
- Hit rate (directional accuracy)
- Annualized Sharpe ratio of the regime-implied signal
- Mean ± std across random splits

This script explores model robustness across different resample rules (e.g. 30min, 45min, 1H).

2. `main_conf_metrics.py`

Lightweight evaluator used for the confidence and coverage sweep.

Adds a --conf parameter to control how confident the model must be before emitting a trade (pseudo-ETSC gate).
Prints per-run metrics:
- cov: coverage (fraction of bars with predictions)
- hit: overall hit rate
- hit_trades: accuracy conditional on trading
- Sharpe: annualized risk-adjusted performance

Used by shell scripts to benchmark many timeframes and confidence thresholds.

3. Shell scripts

`run_grid.sh`

Runs a large grid of:

multiple resample rules (e.g. 20 min – 60 min),
multiple horizons (e.g. 2–6 bars ahead).

`run_focus.sh`

Focuses on the most promising regions (37–41 min, 49–59 min) and sweeps confidence thresholds (0.45 – 0.60). Produces concise summary lines for each combination.

Key Findings

Optimal timeframe: ~35 – 45 minutes consistently yields the highest Sharpe ratios (~2.2–2.3).
Forecast horizon: Best performance around two bars ahead (~80 min look-ahead for 40 min bars).
Confidence threshold: Little effect between 0.45–0.60; model already confident on > 90 % of bars.
Interpretation: Regimes reflect volatility and structure, not raw direction. Use them to switch strategy archetypes (trend vs. mean-reversion) rather than predict sign.

Example Usage

Single test

python main.py \
  --btc ../data/btcusd_1-min_data.csv \
  --eth ../data/ethusd_1min_ohlc.csv \
  --rules "30min,45min,1H" \
  --states 3 \
  --horizon 60

Confidence and coverage sweep

./run_focus.sh

Typical Output

# Randomized time-split comparison
States=3  HorizonMin=60  Splits=8  TestBars=500  GapBars=24
 rule  splits         hit             sharpe
 45min       8 0.4642 ± 0.0071 2.0575 ± 0.0413
 39min       8 0.4662 ± 0.0083 2.3124 ± 0.0502
 30min       8 0.4632 ± 0.0090 2.0331 ± 0.0368

Interpretation for Strategy Design

Regime Type	Market Traits	Suggested Strategy
High-vol / decoupled	large ETH/BTC divergence	Momentum / Breakout
Low-vol / correlated	calm, mean-reverting	Reversion / Market-Making
Neutral	noisy transitions	Flat / Reduced exposure

Requirements

Python ≥ 3.11
Environment manager: uv (fast Python package installer and environment manager)

Setup

Create and activate a local environment using uv:

# from the project root
uv venv
source .venv/bin/activate

# install dependencies
uv pip install numpy pandas scikit-learn hmmlearn

Repository Structure

.
├── main.py                 # core HMM regime experiment with CV
├── main_conf_metrics.py    # confidence/coverage sweep
├── run_grid.sh             # full grid search over horizons/timeframes
├── run_focus.sh            # focused confidence sweep
├── README.md

README.md Unescape Escape

BTC–ETH Regime Modeling