- Increased health check interval from 30s to 120s in `okx_config.json`. - Added support for additional timeframes (1s, 5s, 10s, 15s, 30s) in the aggregation logic across multiple components. - Updated `CandleProcessingConfig` and `RealTimeCandleProcessor` to handle new timeframes. - Enhanced validation and parsing functions to include new second-based timeframes. - Updated database schema to support new timeframes in `schema_clean.sql`. - Improved documentation to reflect changes in multi-timeframe aggregation capabilities.
10 KiB
Data Aggregation Strategy
Overview
This document describes the comprehensive data aggregation strategy used in the TCP Trading Platform for converting real-time trade data into OHLCV (Open, High, Low, Close, Volume) candles across multiple timeframes, including sub-minute precision.
Core Principles
1. Right-Aligned Timestamps (Industry Standard)
The system follows the RIGHT-ALIGNED timestamp convention used by major exchanges:
- Candle timestamp = end time of the interval (close time)
- This represents when the candle period closes, not when it opens
- Aligns with Binance, OKX, Coinbase, and other major exchanges
- Ensures consistency with historical data APIs
Examples:
- 1-second candle covering 09:00:15.000-09:00:16.000 → timestamp = 09:00:16.000
- 5-second candle covering 09:00:15.000-09:00:20.000 → timestamp = 09:00:20.000
- 30-second candle covering 09:00:00.000-09:00:30.000 → timestamp = 09:00:30.000
- 1-minute candle covering 09:00:00-09:01:00 → timestamp = 09:01:00
- 5-minute candle covering 09:00:00-09:05:00 → timestamp = 09:05:00
2. Sparse Candles (Trade-Driven Aggregation)
CRITICAL: The system uses a SPARSE CANDLE APPROACH - candles are only emitted when trades actually occur during the time period.
What This Means:
- No trades during period = No candle emitted
- Time gaps in data are normal and expected
- Storage efficient - only meaningful periods are stored
- Industry standard behavior matching major exchanges
Examples of Sparse Behavior:
1-Second Timeframe:
09:00:15 → Trade occurs → 1s candle emitted at 09:00:16
09:00:16 → No trades → NO candle emitted
09:00:17 → No trades → NO candle emitted
09:00:18 → Trade occurs → 1s candle emitted at 09:00:19
5-Second Timeframe:
09:00:15-20 → Trades occur → 5s candle emitted at 09:00:20
09:00:20-25 → No trades → NO candle emitted
09:00:25-30 → Trade occurs → 5s candle emitted at 09:00:30
Real-World Coverage Examples:
From live testing with BTC-USDT (3-minute test):
- Expected 1s candles: 180
- Actual 1s candles: 53 (29% coverage)
- Missing periods: 127 seconds with no trading activity
From live testing with ETH-USDT (1-minute test):
- Expected 1s candles: 60
- Actual 1s candles: 22 (37% coverage)
- Missing periods: 38 seconds with no trading activity
3. No Future Leakage Prevention
The aggregation system prevents future leakage by:
- Only completing candles when time boundaries are definitively crossed
- Never emitting incomplete candles during real-time processing
- Waiting for actual trades to trigger bucket completion
- Using trade timestamps, not system clock times, for bucket assignment
Supported Timeframes
The system supports the following timeframes with precise bucket calculations:
Second-Based Timeframes:
- 1s: 1-second buckets (00:00, 00:01, 00:02, ...)
- 5s: 5-second buckets (00:00, 00:05, 00:10, 00:15, ...)
- 10s: 10-second buckets (00:00, 00:10, 00:20, 00:30, ...)
- 15s: 15-second buckets (00:00, 00:15, 00:30, 00:45, ...)
- 30s: 30-second buckets (00:00, 00:30, ...)
Minute-Based Timeframes:
- 1m: 1-minute buckets aligned to minute boundaries
- 5m: 5-minute buckets (00:00, 00:05, 00:10, ...)
- 15m: 15-minute buckets (00:00, 00:15, 00:30, 00:45)
- 30m: 30-minute buckets (00:00, 00:30)
Hour-Based Timeframes:
- 1h: 1-hour buckets aligned to hour boundaries
- 4h: 4-hour buckets (00:00, 04:00, 08:00, 12:00, 16:00, 20:00)
- 1d: 1-day buckets aligned to midnight UTC
Processing Flow
Real-Time Aggregation Process
- Trade arrives from WebSocket with timestamp T
- For each configured timeframe:
- Calculate which time bucket this trade belongs to
- Get current bucket for this timeframe
- Check if trade timestamp crosses time boundary
- If boundary crossed: complete and emit previous bucket (only if it has trades), create new bucket
- Add trade to current bucket (updates OHLCV)
- Only emit completed candles when time boundaries are definitively crossed
- Never emit incomplete/future candles during real-time processing
Bucket Management
Time Bucket Creation:
- Buckets are created only when the first trade arrives for that time period
- Empty time periods do not create buckets
Bucket Completion:
- Buckets are completed only when a trade arrives that belongs to a different time bucket
- Completed buckets are emitted only if they contain at least one trade
- Empty buckets are discarded silently
Example Timeline:
Time Trade 1s Bucket Action 5s Bucket Action
------- ------- ------------------------- ------------------
09:15:23 BUY 0.1 Create bucket 09:15:23 Create bucket 09:15:20
09:15:24 SELL 0.2 Complete 09:15:23 → emit Add to 09:15:20
09:15:25 - (no trade = no action) (no action)
09:15:26 BUY 0.5 Create bucket 09:15:26 Complete 09:15:20 → emit
Handling Sparse Data in Applications
For Trading Algorithms
def handle_sparse_candles(candles: List[OHLCVCandle], timeframe: str) -> List[OHLCVCandle]:
"""
Handle sparse candle data in trading algorithms.
"""
if not candles:
return candles
# Option 1: Use only available data (recommended)
# Just work with what you have - gaps indicate no trading activity
return candles
# Option 2: Fill gaps with last known price (if needed)
filled_candles = []
last_candle = None
for candle in candles:
if last_candle:
# Check for gap
expected_next = last_candle.end_time + get_timeframe_delta(timeframe)
if candle.start_time > expected_next:
# Gap detected - could fill if needed for your strategy
pass
filled_candles.append(candle)
last_candle = candle
return filled_candles
For Charting and Visualization
def prepare_chart_data(candles: List[OHLCVCandle], fill_gaps: bool = True) -> List[OHLCVCandle]:
"""
Prepare sparse candle data for charting applications.
"""
if not fill_gaps or not candles:
return candles
# Fill gaps with previous close price for continuous charts
filled_candles = []
for i, candle in enumerate(candles):
if i > 0:
prev_candle = filled_candles[-1]
gap_periods = calculate_gap_periods(prev_candle.end_time, candle.start_time, timeframe)
# Fill gap periods with flat candles
for gap_time in gap_periods:
flat_candle = create_flat_candle(
start_time=gap_time,
price=prev_candle.close,
timeframe=timeframe
)
filled_candles.append(flat_candle)
filled_candles.append(candle)
return filled_candles
Database Queries
When querying candle data, be aware of potential gaps:
-- Query that handles sparse data appropriately
SELECT
timestamp,
open, high, low, close, volume,
trade_count,
-- Flag periods with actual trading activity
CASE WHEN trade_count > 0 THEN 'ACTIVE' ELSE 'EMPTY' END as period_type
FROM market_data
WHERE symbol = 'BTC-USDT'
AND timeframe = '1s'
AND timestamp BETWEEN '2024-01-01 09:00:00' AND '2024-01-01 09:05:00'
ORDER BY timestamp;
-- Query to detect gaps in data
WITH candle_gaps AS (
SELECT
timestamp,
LAG(timestamp) OVER (ORDER BY timestamp) as prev_timestamp,
timestamp - LAG(timestamp) OVER (ORDER BY timestamp) as gap_duration
FROM market_data
WHERE symbol = 'BTC-USDT' AND timeframe = '1s'
ORDER BY timestamp
)
SELECT * FROM candle_gaps
WHERE gap_duration > INTERVAL '1 second';
Performance Characteristics
Storage Efficiency
- Sparse approach reduces storage by 50-80% compared to complete time series
- Only meaningful periods are stored in the database
- Faster queries due to smaller dataset size
Processing Efficiency
- Lower memory usage during real-time processing
- Faster aggregation - no need to maintain empty buckets
- Efficient WebSocket processing - only processes actual market events
Coverage Statistics
Based on real-world testing:
| Timeframe | Major Pairs Coverage | Minor Pairs Coverage |
|---|---|---|
| 1s | 20-40% | 5-15% |
| 5s | 60-80% | 30-50% |
| 10s | 75-90% | 50-70% |
| 15s | 80-95% | 60-80% |
| 30s | 90-98% | 80-95% |
| 1m | 95-99% | 90-98% |
Coverage = Percentage of time periods that actually have candles
Best Practices
For Real-Time Systems
- Design algorithms to handle gaps - missing candles are normal
- Use last known price for periods without trades
- Don't interpolate unless specifically required
- Monitor coverage ratios to detect market conditions
For Historical Analysis
- Be aware of sparse data when calculating statistics
- Consider volume-weighted metrics over time-weighted ones
- Use trade_count=0 to identify empty periods when filling gaps
- Validate data completeness before running backtests
For Database Storage
- Index on (symbol, timeframe, timestamp) for efficient queries
- Partition by time periods for large datasets
- Consider trade_count > 0 filters for active-only queries
- Monitor storage growth - sparse data grows much slower
Configuration
The sparse aggregation behavior is controlled by:
{
"timeframes": ["1s", "5s", "10s", "15s", "30s", "1m", "5m", "15m", "1h"],
"auto_save_candles": true,
"emit_incomplete_candles": false, // Never emit incomplete candles
"max_trades_per_candle": 100000
}
Key Setting: emit_incomplete_candles: false ensures only complete, trade-containing candles are emitted.
Note: This sparse approach is the industry standard used by major exchanges and trading platforms. It provides the most accurate representation of actual market activity while maintaining efficiency and preventing data artifacts.