177 lines
4.8 KiB
Markdown
Raw Normal View History

# BTCETH Regime Modeling
2025-10-10 06:53:24 +00:00
This project builds and tests a **Hidden Markov Model (HMM)** that classifies structural market regimes in Bitcoin and Ethereum based on 1-minute OHLCV data from Bitstamp (https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data/ and https://www.kaggle.com/datasets/viniciusqroz/ethereum-historical-data).
It is designed to identify volatility and correlation phases—*risk-on*, *risk-off*, and *neutral*—and to evaluate how predictive or useful these regimes are across multiple timeframes and forecast horizons.
---
## Project Overview
### Goals
1. Detect repeating market “regimes” using unsupervised learning.
2. Evaluate how those regimes behave across timeframes and forecast horizons.
3. Use regime identification to **select trading strategies per market state**, rather than predict short-term direction.
### Datasets
Two synchronized 1-minute OHLCV datasets:
* `btcusd_1-min_data.csv`
* `ethusd_1min_ohlc.csv`
Both sourced from Bitstamp (Kaggle datasets).
---
## Architecture
### 1. `main.py`
Core experiment runner.
Implements:
* **Feature construction**:
* Multi-scale realized volatility (`rv_*`)
* Trend ratios (`trend_*`)
* Rolling BTCETH correlations (`corr_*`)
* Cross-asset beta and divergence
* Liquidity proxies (`volratio`, `vol_sum`, `vol_diff`)
* **Hidden Markov Model**: Gaussian emissions, diagonal covariance.
* **Randomized time-split validation**: multiple random train/test windows with configurable embargo gap to avoid leakage.
* **Metrics**:
* Hit rate (directional accuracy)
* Annualized Sharpe ratio of the regime-implied signal
* Mean ± std across random splits
This script explores model robustness across **different resample rules** (e.g. `30min`, `45min`, `1H`).
---
### 2. `main_conf_metrics.py`
Lightweight evaluator used for the **confidence and coverage sweep**.
* Adds a `--conf` parameter to control how confident the model must be before emitting a trade (pseudo-ETSC gate).
* Prints per-run metrics:
* `cov`: coverage (fraction of bars with predictions)
* `hit`: overall hit rate
* `hit_trades`: accuracy conditional on trading
* `Sharpe`: annualized risk-adjusted performance
Used by shell scripts to benchmark many timeframes and confidence thresholds.
---
### 3. Shell scripts
#### `run_grid.sh`
Runs a large grid of:
* multiple resample rules (e.g. 20 min 60 min),
* multiple horizons (e.g. 26 bars ahead).
#### `run_focus.sh`
Focuses on the most promising regions (3741 min, 4959 min)
and sweeps confidence thresholds (0.45 0.60).
Produces concise summary lines for each combination.
---
## Key Findings
1. **Optimal timeframe:**
~**35 45 minutes** consistently yields the highest Sharpe ratios (~2.22.3).
2. **Forecast horizon:**
Best performance around **two bars ahead** (~80 min look-ahead for 40 min bars).
3. **Confidence threshold:**
Little effect between 0.450.60; model already confident on > 90 % of bars.
4. **Interpretation:**
Regimes reflect volatility and structure, not raw direction.
Use them to switch *strategy archetypes* (trend vs. mean-reversion) rather than predict sign.
---
## Example Usage
### Single test
```bash
python main.py \
--btc ../data/btcusd_1-min_data.csv \
--eth ../data/ethusd_1min_ohlc.csv \
--rules "30min,45min,1H" \
--states 3 \
--horizon 60
```
### Confidence and coverage sweep
```bash
./run_focus.sh
```
---
## Typical Output
```
# Randomized time-split comparison
States=3 HorizonMin=60 Splits=8 TestBars=500 GapBars=24
rule splits hit sharpe
45min 8 0.4642 ± 0.0071 2.0575 ± 0.0413
39min 8 0.4662 ± 0.0083 2.3124 ± 0.0502
30min 8 0.4632 ± 0.0090 2.0331 ± 0.0368
```
---
## Interpretation for Strategy Design
| Regime Type | Market Traits | Suggested Strategy |
| -------------------- | ------------------------ | ------------------------- |
| High-vol / decoupled | large ETH/BTC divergence | Momentum / Breakout |
| Low-vol / correlated | calm, mean-reverting | Reversion / Market-Making |
| Neutral | noisy transitions | Flat / Reduced exposure |
---
## Requirements
* **Python ≥ 3.11**
* **Environment manager:** [**uv**](https://github.com/astral-sh/uv) (fast Python package installer and environment manager)
### Setup
Create and activate a local environment using **uv**:
```bash
# from the project root
uv venv
source .venv/bin/activate
# install dependencies
uv pip install numpy pandas scikit-learn hmmlearn
```
---
## Repository Structure
```
.
├── main.py # core HMM regime experiment with CV
├── main_conf_metrics.py # confidence/coverage sweep
├── run_grid.sh # full grid search over horizons/timeframes
├── run_focus.sh # focused confidence sweep
├── README.md
```