- Increased health check interval from 30s to 120s in `okx_config.json`. - Added support for additional timeframes (1s, 5s, 10s, 15s, 30s) in the aggregation logic across multiple components. - Updated `CandleProcessingConfig` and `RealTimeCandleProcessor` to handle new timeframes. - Enhanced validation and parsing functions to include new second-based timeframes. - Updated database schema to support new timeframes in `schema_clean.sql`. - Improved documentation to reflect changes in multi-timeframe aggregation capabilities.
291 lines
10 KiB
Markdown
291 lines
10 KiB
Markdown
# Data Aggregation Strategy
|
|
|
|
## Overview
|
|
|
|
This document describes the comprehensive data aggregation strategy used in the TCP Trading Platform for converting real-time trade data into OHLCV (Open, High, Low, Close, Volume) candles across multiple timeframes, including sub-minute precision.
|
|
|
|
## Core Principles
|
|
|
|
### 1. Right-Aligned Timestamps (Industry Standard)
|
|
|
|
The system follows the **RIGHT-ALIGNED timestamp** convention used by major exchanges:
|
|
|
|
- **Candle timestamp = end time of the interval (close time)**
|
|
- This represents when the candle period **closes**, not when it opens
|
|
- Aligns with Binance, OKX, Coinbase, and other major exchanges
|
|
- Ensures consistency with historical data APIs
|
|
|
|
**Examples:**
|
|
- 1-second candle covering 09:00:15.000-09:00:16.000 → timestamp = 09:00:16.000
|
|
- 5-second candle covering 09:00:15.000-09:00:20.000 → timestamp = 09:00:20.000
|
|
- 30-second candle covering 09:00:00.000-09:00:30.000 → timestamp = 09:00:30.000
|
|
- 1-minute candle covering 09:00:00-09:01:00 → timestamp = 09:01:00
|
|
- 5-minute candle covering 09:00:00-09:05:00 → timestamp = 09:05:00
|
|
|
|
### 2. Sparse Candles (Trade-Driven Aggregation)
|
|
|
|
**CRITICAL**: The system uses a **SPARSE CANDLE APPROACH** - candles are only emitted when trades actually occur during the time period.
|
|
|
|
#### What This Means:
|
|
- **No trades during period = No candle emitted**
|
|
- **Time gaps in data** are normal and expected
|
|
- **Storage efficient** - only meaningful periods are stored
|
|
- **Industry standard** behavior matching major exchanges
|
|
|
|
#### Examples of Sparse Behavior:
|
|
|
|
**1-Second Timeframe:**
|
|
```
|
|
09:00:15 → Trade occurs → 1s candle emitted at 09:00:16
|
|
09:00:16 → No trades → NO candle emitted
|
|
09:00:17 → No trades → NO candle emitted
|
|
09:00:18 → Trade occurs → 1s candle emitted at 09:00:19
|
|
```
|
|
|
|
**5-Second Timeframe:**
|
|
```
|
|
09:00:15-20 → Trades occur → 5s candle emitted at 09:00:20
|
|
09:00:20-25 → No trades → NO candle emitted
|
|
09:00:25-30 → Trade occurs → 5s candle emitted at 09:00:30
|
|
```
|
|
|
|
#### Real-World Coverage Examples:
|
|
|
|
From live testing with BTC-USDT (3-minute test):
|
|
- **Expected 1s candles**: 180
|
|
- **Actual 1s candles**: 53 (29% coverage)
|
|
- **Missing periods**: 127 seconds with no trading activity
|
|
|
|
From live testing with ETH-USDT (1-minute test):
|
|
- **Expected 1s candles**: 60
|
|
- **Actual 1s candles**: 22 (37% coverage)
|
|
- **Missing periods**: 38 seconds with no trading activity
|
|
|
|
### 3. No Future Leakage Prevention
|
|
|
|
The aggregation system prevents future leakage by:
|
|
|
|
- **Only completing candles when time boundaries are definitively crossed**
|
|
- **Never emitting incomplete candles during real-time processing**
|
|
- **Waiting for actual trades to trigger bucket completion**
|
|
- **Using trade timestamps, not system clock times, for bucket assignment**
|
|
|
|
## Supported Timeframes
|
|
|
|
The system supports the following timeframes with precise bucket calculations:
|
|
|
|
### Second-Based Timeframes:
|
|
- **1s**: 1-second buckets (00:00, 00:01, 00:02, ...)
|
|
- **5s**: 5-second buckets (00:00, 00:05, 00:10, 00:15, ...)
|
|
- **10s**: 10-second buckets (00:00, 00:10, 00:20, 00:30, ...)
|
|
- **15s**: 15-second buckets (00:00, 00:15, 00:30, 00:45, ...)
|
|
- **30s**: 30-second buckets (00:00, 00:30, ...)
|
|
|
|
### Minute-Based Timeframes:
|
|
- **1m**: 1-minute buckets aligned to minute boundaries
|
|
- **5m**: 5-minute buckets (00:00, 00:05, 00:10, ...)
|
|
- **15m**: 15-minute buckets (00:00, 00:15, 00:30, 00:45)
|
|
- **30m**: 30-minute buckets (00:00, 00:30)
|
|
|
|
### Hour-Based Timeframes:
|
|
- **1h**: 1-hour buckets aligned to hour boundaries
|
|
- **4h**: 4-hour buckets (00:00, 04:00, 08:00, 12:00, 16:00, 20:00)
|
|
- **1d**: 1-day buckets aligned to midnight UTC
|
|
|
|
## Processing Flow
|
|
|
|
### Real-Time Aggregation Process
|
|
|
|
1. **Trade arrives** from WebSocket with timestamp T
|
|
2. **For each configured timeframe**:
|
|
- Calculate which time bucket this trade belongs to
|
|
- Get current bucket for this timeframe
|
|
- **Check if trade timestamp crosses time boundary**
|
|
- **If boundary crossed**: complete and emit previous bucket (only if it has trades), create new bucket
|
|
- Add trade to current bucket (updates OHLCV)
|
|
3. **Only emit completed candles** when time boundaries are definitively crossed
|
|
4. **Never emit incomplete/future candles** during real-time processing
|
|
|
|
### Bucket Management
|
|
|
|
**Time Bucket Creation:**
|
|
- Buckets are created **only when the first trade arrives** for that time period
|
|
- Empty time periods do not create buckets
|
|
|
|
**Bucket Completion:**
|
|
- Buckets are completed **only when a trade arrives that belongs to a different time bucket**
|
|
- Completed buckets are emitted **only if they contain at least one trade**
|
|
- Empty buckets are discarded silently
|
|
|
|
**Example Timeline:**
|
|
```
|
|
Time Trade 1s Bucket Action 5s Bucket Action
|
|
------- ------- ------------------------- ------------------
|
|
09:15:23 BUY 0.1 Create bucket 09:15:23 Create bucket 09:15:20
|
|
09:15:24 SELL 0.2 Complete 09:15:23 → emit Add to 09:15:20
|
|
09:15:25 - (no trade = no action) (no action)
|
|
09:15:26 BUY 0.5 Create bucket 09:15:26 Complete 09:15:20 → emit
|
|
```
|
|
|
|
## Handling Sparse Data in Applications
|
|
|
|
### For Trading Algorithms
|
|
|
|
```python
|
|
def handle_sparse_candles(candles: List[OHLCVCandle], timeframe: str) -> List[OHLCVCandle]:
|
|
"""
|
|
Handle sparse candle data in trading algorithms.
|
|
"""
|
|
if not candles:
|
|
return candles
|
|
|
|
# Option 1: Use only available data (recommended)
|
|
# Just work with what you have - gaps indicate no trading activity
|
|
return candles
|
|
|
|
# Option 2: Fill gaps with last known price (if needed)
|
|
filled_candles = []
|
|
last_candle = None
|
|
|
|
for candle in candles:
|
|
if last_candle:
|
|
# Check for gap
|
|
expected_next = last_candle.end_time + get_timeframe_delta(timeframe)
|
|
if candle.start_time > expected_next:
|
|
# Gap detected - could fill if needed for your strategy
|
|
pass
|
|
|
|
filled_candles.append(candle)
|
|
last_candle = candle
|
|
|
|
return filled_candles
|
|
```
|
|
|
|
### For Charting and Visualization
|
|
|
|
```python
|
|
def prepare_chart_data(candles: List[OHLCVCandle], fill_gaps: bool = True) -> List[OHLCVCandle]:
|
|
"""
|
|
Prepare sparse candle data for charting applications.
|
|
"""
|
|
if not fill_gaps or not candles:
|
|
return candles
|
|
|
|
# Fill gaps with previous close price for continuous charts
|
|
filled_candles = []
|
|
|
|
for i, candle in enumerate(candles):
|
|
if i > 0:
|
|
prev_candle = filled_candles[-1]
|
|
gap_periods = calculate_gap_periods(prev_candle.end_time, candle.start_time, timeframe)
|
|
|
|
# Fill gap periods with flat candles
|
|
for gap_time in gap_periods:
|
|
flat_candle = create_flat_candle(
|
|
start_time=gap_time,
|
|
price=prev_candle.close,
|
|
timeframe=timeframe
|
|
)
|
|
filled_candles.append(flat_candle)
|
|
|
|
filled_candles.append(candle)
|
|
|
|
return filled_candles
|
|
```
|
|
|
|
### Database Queries
|
|
|
|
When querying candle data, be aware of potential gaps:
|
|
|
|
```sql
|
|
-- Query that handles sparse data appropriately
|
|
SELECT
|
|
timestamp,
|
|
open, high, low, close, volume,
|
|
trade_count,
|
|
-- Flag periods with actual trading activity
|
|
CASE WHEN trade_count > 0 THEN 'ACTIVE' ELSE 'EMPTY' END as period_type
|
|
FROM market_data
|
|
WHERE symbol = 'BTC-USDT'
|
|
AND timeframe = '1s'
|
|
AND timestamp BETWEEN '2024-01-01 09:00:00' AND '2024-01-01 09:05:00'
|
|
ORDER BY timestamp;
|
|
|
|
-- Query to detect gaps in data
|
|
WITH candle_gaps AS (
|
|
SELECT
|
|
timestamp,
|
|
LAG(timestamp) OVER (ORDER BY timestamp) as prev_timestamp,
|
|
timestamp - LAG(timestamp) OVER (ORDER BY timestamp) as gap_duration
|
|
FROM market_data
|
|
WHERE symbol = 'BTC-USDT' AND timeframe = '1s'
|
|
ORDER BY timestamp
|
|
)
|
|
SELECT * FROM candle_gaps
|
|
WHERE gap_duration > INTERVAL '1 second';
|
|
```
|
|
|
|
## Performance Characteristics
|
|
|
|
### Storage Efficiency
|
|
- **Sparse approach reduces storage** by 50-80% compared to complete time series
|
|
- **Only meaningful periods** are stored in the database
|
|
- **Faster queries** due to smaller dataset size
|
|
|
|
### Processing Efficiency
|
|
- **Lower memory usage** during real-time processing
|
|
- **Faster aggregation** - no need to maintain empty buckets
|
|
- **Efficient WebSocket processing** - only processes actual market events
|
|
|
|
### Coverage Statistics
|
|
Based on real-world testing:
|
|
|
|
| Timeframe | Major Pairs Coverage | Minor Pairs Coverage |
|
|
|-----------|---------------------|---------------------|
|
|
| 1s | 20-40% | 5-15% |
|
|
| 5s | 60-80% | 30-50% |
|
|
| 10s | 75-90% | 50-70% |
|
|
| 15s | 80-95% | 60-80% |
|
|
| 30s | 90-98% | 80-95% |
|
|
| 1m | 95-99% | 90-98% |
|
|
|
|
*Coverage = Percentage of time periods that actually have candles*
|
|
|
|
## Best Practices
|
|
|
|
### For Real-Time Systems
|
|
1. **Design algorithms to handle gaps** - missing candles are normal
|
|
2. **Use last known price** for periods without trades
|
|
3. **Don't interpolate** unless specifically required
|
|
4. **Monitor coverage ratios** to detect market conditions
|
|
|
|
### For Historical Analysis
|
|
1. **Be aware of sparse data** when calculating statistics
|
|
2. **Consider volume-weighted metrics** over time-weighted ones
|
|
3. **Use trade_count=0** to identify empty periods when filling gaps
|
|
4. **Validate data completeness** before running backtests
|
|
|
|
### For Database Storage
|
|
1. **Index on (symbol, timeframe, timestamp)** for efficient queries
|
|
2. **Partition by time periods** for large datasets
|
|
3. **Consider trade_count > 0** filters for active-only queries
|
|
4. **Monitor storage growth** - sparse data grows much slower
|
|
|
|
## Configuration
|
|
|
|
The sparse aggregation behavior is controlled by:
|
|
|
|
```json
|
|
{
|
|
"timeframes": ["1s", "5s", "10s", "15s", "30s", "1m", "5m", "15m", "1h"],
|
|
"auto_save_candles": true,
|
|
"emit_incomplete_candles": false, // Never emit incomplete candles
|
|
"max_trades_per_candle": 100000
|
|
}
|
|
```
|
|
|
|
**Key Setting**: `emit_incomplete_candles: false` ensures only complete, trade-containing candles are emitted.
|
|
|
|
---
|
|
|
|
**Note**: This sparse approach is the **industry standard** used by major exchanges and trading platforms. It provides the most accurate representation of actual market activity while maintaining efficiency and preventing data artifacts. |