177 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BTCETH Regime Modeling
This project builds and tests a **Hidden Markov Model (HMM)** that classifies structural market regimes in Bitcoin and Ethereum based on 1-minute OHLCV data from Bitstamp (https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data/ and https://www.kaggle.com/datasets/viniciusqroz/ethereum-historical-data).
It is designed to identify volatility and correlation phases—*risk-on*, *risk-off*, and *neutral*—and to evaluate how predictive or useful these regimes are across multiple timeframes and forecast horizons.
---
## Project Overview
### Goals
1. Detect repeating market “regimes” using unsupervised learning.
2. Evaluate how those regimes behave across timeframes and forecast horizons.
3. Use regime identification to **select trading strategies per market state**, rather than predict short-term direction.
### Datasets
Two synchronized 1-minute OHLCV datasets:
* `btcusd_1-min_data.csv`
* `ethusd_1min_ohlc.csv`
Both sourced from Bitstamp (Kaggle datasets).
---
## Architecture
### 1. `main.py`
Core experiment runner.
Implements:
* **Feature construction**:
* Multi-scale realized volatility (`rv_*`)
* Trend ratios (`trend_*`)
* Rolling BTCETH correlations (`corr_*`)
* Cross-asset beta and divergence
* Liquidity proxies (`volratio`, `vol_sum`, `vol_diff`)
* **Hidden Markov Model**: Gaussian emissions, diagonal covariance.
* **Randomized time-split validation**: multiple random train/test windows with configurable embargo gap to avoid leakage.
* **Metrics**:
* Hit rate (directional accuracy)
* Annualized Sharpe ratio of the regime-implied signal
* Mean ± std across random splits
This script explores model robustness across **different resample rules** (e.g. `30min`, `45min`, `1H`).
---
### 2. `main_conf_metrics.py`
Lightweight evaluator used for the **confidence and coverage sweep**.
* Adds a `--conf` parameter to control how confident the model must be before emitting a trade (pseudo-ETSC gate).
* Prints per-run metrics:
* `cov`: coverage (fraction of bars with predictions)
* `hit`: overall hit rate
* `hit_trades`: accuracy conditional on trading
* `Sharpe`: annualized risk-adjusted performance
Used by shell scripts to benchmark many timeframes and confidence thresholds.
---
### 3. Shell scripts
#### `run_grid.sh`
Runs a large grid of:
* multiple resample rules (e.g. 20 min 60 min),
* multiple horizons (e.g. 26 bars ahead).
#### `run_focus.sh`
Focuses on the most promising regions (3741 min, 4959 min)
and sweeps confidence thresholds (0.45 0.60).
Produces concise summary lines for each combination.
---
## Key Findings
1. **Optimal timeframe:**
~**35 45 minutes** consistently yields the highest Sharpe ratios (~2.22.3).
2. **Forecast horizon:**
Best performance around **two bars ahead** (~80 min look-ahead for 40 min bars).
3. **Confidence threshold:**
Little effect between 0.450.60; model already confident on > 90 % of bars.
4. **Interpretation:**
Regimes reflect volatility and structure, not raw direction.
Use them to switch *strategy archetypes* (trend vs. mean-reversion) rather than predict sign.
---
## Example Usage
### Single test
```bash
python main.py \
--btc ../data/btcusd_1-min_data.csv \
--eth ../data/ethusd_1min_ohlc.csv \
--rules "30min,45min,1H" \
--states 3 \
--horizon 60
```
### Confidence and coverage sweep
```bash
./run_focus.sh
```
---
## Typical Output
```
# Randomized time-split comparison
States=3 HorizonMin=60 Splits=8 TestBars=500 GapBars=24
rule splits hit sharpe
45min 8 0.4642 ± 0.0071 2.0575 ± 0.0413
39min 8 0.4662 ± 0.0083 2.3124 ± 0.0502
30min 8 0.4632 ± 0.0090 2.0331 ± 0.0368
```
---
## Interpretation for Strategy Design
| Regime Type | Market Traits | Suggested Strategy |
| -------------------- | ------------------------ | ------------------------- |
| High-vol / decoupled | large ETH/BTC divergence | Momentum / Breakout |
| Low-vol / correlated | calm, mean-reverting | Reversion / Market-Making |
| Neutral | noisy transitions | Flat / Reduced exposure |
---
## Requirements
* **Python ≥ 3.11**
* **Environment manager:** [**uv**](https://github.com/astral-sh/uv) (fast Python package installer and environment manager)
### Setup
Create and activate a local environment using **uv**:
```bash
# from the project root
uv venv
source .venv/bin/activate
# install dependencies
uv pip install numpy pandas scikit-learn hmmlearn
```
---
## Repository Structure
```
.
├── main.py # core HMM regime experiment with CV
├── main_conf_metrics.py # confidence/coverage sweep
├── run_grid.sh # full grid search over horizons/timeframes
├── run_focus.sh # focused confidence sweep
├── README.md
```