177 lines
4.8 KiB
Markdown
177 lines
4.8 KiB
Markdown
# BTC–ETH Regime Modeling
|
||
|
||
This project builds and tests a **Hidden Markov Model (HMM)** that classifies structural market regimes in Bitcoin and Ethereum based on 1-minute OHLCV data from Bitstamp (https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data/ and https://www.kaggle.com/datasets/viniciusqroz/ethereum-historical-data).
|
||
|
||
It is designed to identify volatility and correlation phases—*risk-on*, *risk-off*, and *neutral*—and to evaluate how predictive or useful these regimes are across multiple timeframes and forecast horizons.
|
||
|
||
---
|
||
|
||
## Project Overview
|
||
|
||
### Goals
|
||
|
||
1. Detect repeating market “regimes” using unsupervised learning.
|
||
2. Evaluate how those regimes behave across timeframes and forecast horizons.
|
||
3. Use regime identification to **select trading strategies per market state**, rather than predict short-term direction.
|
||
|
||
### Datasets
|
||
|
||
Two synchronized 1-minute OHLCV datasets:
|
||
|
||
* `btcusd_1-min_data.csv`
|
||
* `ethusd_1min_ohlc.csv`
|
||
|
||
Both sourced from Bitstamp (Kaggle datasets).
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
### 1. `main.py`
|
||
|
||
Core experiment runner.
|
||
Implements:
|
||
|
||
* **Feature construction**:
|
||
|
||
* Multi-scale realized volatility (`rv_*`)
|
||
* Trend ratios (`trend_*`)
|
||
* Rolling BTC–ETH correlations (`corr_*`)
|
||
* Cross-asset beta and divergence
|
||
* Liquidity proxies (`volratio`, `vol_sum`, `vol_diff`)
|
||
* **Hidden Markov Model**: Gaussian emissions, diagonal covariance.
|
||
* **Randomized time-split validation**: multiple random train/test windows with configurable embargo gap to avoid leakage.
|
||
* **Metrics**:
|
||
|
||
* Hit rate (directional accuracy)
|
||
* Annualized Sharpe ratio of the regime-implied signal
|
||
* Mean ± std across random splits
|
||
|
||
This script explores model robustness across **different resample rules** (e.g. `30min`, `45min`, `1H`).
|
||
|
||
---
|
||
|
||
### 2. `main_conf_metrics.py`
|
||
|
||
Lightweight evaluator used for the **confidence and coverage sweep**.
|
||
|
||
* Adds a `--conf` parameter to control how confident the model must be before emitting a trade (pseudo-ETSC gate).
|
||
* Prints per-run metrics:
|
||
|
||
* `cov`: coverage (fraction of bars with predictions)
|
||
* `hit`: overall hit rate
|
||
* `hit_trades`: accuracy conditional on trading
|
||
* `Sharpe`: annualized risk-adjusted performance
|
||
|
||
Used by shell scripts to benchmark many timeframes and confidence thresholds.
|
||
|
||
---
|
||
|
||
### 3. Shell scripts
|
||
|
||
#### `run_grid.sh`
|
||
|
||
Runs a large grid of:
|
||
|
||
* multiple resample rules (e.g. 20 min – 60 min),
|
||
* multiple horizons (e.g. 2–6 bars ahead).
|
||
|
||
#### `run_focus.sh`
|
||
|
||
Focuses on the most promising regions (37–41 min, 49–59 min)
|
||
and sweeps confidence thresholds (0.45 – 0.60).
|
||
Produces concise summary lines for each combination.
|
||
|
||
---
|
||
|
||
## Key Findings
|
||
|
||
1. **Optimal timeframe:**
|
||
~**35 – 45 minutes** consistently yields the highest Sharpe ratios (~2.2–2.3).
|
||
|
||
2. **Forecast horizon:**
|
||
Best performance around **two bars ahead** (~80 min look-ahead for 40 min bars).
|
||
|
||
3. **Confidence threshold:**
|
||
Little effect between 0.45–0.60; model already confident on > 90 % of bars.
|
||
|
||
4. **Interpretation:**
|
||
Regimes reflect volatility and structure, not raw direction.
|
||
Use them to switch *strategy archetypes* (trend vs. mean-reversion) rather than predict sign.
|
||
|
||
---
|
||
|
||
## Example Usage
|
||
|
||
### Single test
|
||
|
||
```bash
|
||
python main.py \
|
||
--btc ../data/btcusd_1-min_data.csv \
|
||
--eth ../data/ethusd_1min_ohlc.csv \
|
||
--rules "30min,45min,1H" \
|
||
--states 3 \
|
||
--horizon 60
|
||
```
|
||
|
||
### Confidence and coverage sweep
|
||
|
||
```bash
|
||
./run_focus.sh
|
||
```
|
||
|
||
---
|
||
|
||
## Typical Output
|
||
|
||
```
|
||
# Randomized time-split comparison
|
||
States=3 HorizonMin=60 Splits=8 TestBars=500 GapBars=24
|
||
rule splits hit sharpe
|
||
45min 8 0.4642 ± 0.0071 2.0575 ± 0.0413
|
||
39min 8 0.4662 ± 0.0083 2.3124 ± 0.0502
|
||
30min 8 0.4632 ± 0.0090 2.0331 ± 0.0368
|
||
```
|
||
|
||
---
|
||
|
||
## Interpretation for Strategy Design
|
||
|
||
| Regime Type | Market Traits | Suggested Strategy |
|
||
| -------------------- | ------------------------ | ------------------------- |
|
||
| High-vol / decoupled | large ETH/BTC divergence | Momentum / Breakout |
|
||
| Low-vol / correlated | calm, mean-reverting | Reversion / Market-Making |
|
||
| Neutral | noisy transitions | Flat / Reduced exposure |
|
||
|
||
---
|
||
|
||
## Requirements
|
||
|
||
* **Python ≥ 3.11**
|
||
* **Environment manager:** [**uv**](https://github.com/astral-sh/uv) (fast Python package installer and environment manager)
|
||
|
||
### Setup
|
||
|
||
Create and activate a local environment using **uv**:
|
||
|
||
```bash
|
||
# from the project root
|
||
uv venv
|
||
source .venv/bin/activate
|
||
|
||
# install dependencies
|
||
uv pip install numpy pandas scikit-learn hmmlearn
|
||
```
|
||
|
||
---
|
||
|
||
## Repository Structure
|
||
|
||
```
|
||
.
|
||
├── main.py # core HMM regime experiment with CV
|
||
├── main_conf_metrics.py # confidence/coverage sweep
|
||
├── run_grid.sh # full grid search over horizons/timeframes
|
||
├── run_focus.sh # focused confidence sweep
|
||
├── README.md
|
||
``` |