TCPDashboard/docs/reference/aggregation-strategy.md
Vasily.onl 02a51521a0 Update OKX configuration and aggregation logic for enhanced multi-timeframe support
- Increased health check interval from 30s to 120s in `okx_config.json`.
- Added support for additional timeframes (1s, 5s, 10s, 15s, 30s) in the aggregation logic across multiple components.
- Updated `CandleProcessingConfig` and `RealTimeCandleProcessor` to handle new timeframes.
- Enhanced validation and parsing functions to include new second-based timeframes.
- Updated database schema to support new timeframes in `schema_clean.sql`.
- Improved documentation to reflect changes in multi-timeframe aggregation capabilities.
2025-06-02 12:35:19 +08:00

10 KiB

Data Aggregation Strategy

Overview

This document describes the comprehensive data aggregation strategy used in the TCP Trading Platform for converting real-time trade data into OHLCV (Open, High, Low, Close, Volume) candles across multiple timeframes, including sub-minute precision.

Core Principles

1. Right-Aligned Timestamps (Industry Standard)

The system follows the RIGHT-ALIGNED timestamp convention used by major exchanges:

  • Candle timestamp = end time of the interval (close time)
  • This represents when the candle period closes, not when it opens
  • Aligns with Binance, OKX, Coinbase, and other major exchanges
  • Ensures consistency with historical data APIs

Examples:

  • 1-second candle covering 09:00:15.000-09:00:16.000 → timestamp = 09:00:16.000
  • 5-second candle covering 09:00:15.000-09:00:20.000 → timestamp = 09:00:20.000
  • 30-second candle covering 09:00:00.000-09:00:30.000 → timestamp = 09:00:30.000
  • 1-minute candle covering 09:00:00-09:01:00 → timestamp = 09:01:00
  • 5-minute candle covering 09:00:00-09:05:00 → timestamp = 09:05:00

2. Sparse Candles (Trade-Driven Aggregation)

CRITICAL: The system uses a SPARSE CANDLE APPROACH - candles are only emitted when trades actually occur during the time period.

What This Means:

  • No trades during period = No candle emitted
  • Time gaps in data are normal and expected
  • Storage efficient - only meaningful periods are stored
  • Industry standard behavior matching major exchanges

Examples of Sparse Behavior:

1-Second Timeframe:

09:00:15 → Trade occurs → 1s candle emitted at 09:00:16
09:00:16 → No trades → NO candle emitted
09:00:17 → No trades → NO candle emitted  
09:00:18 → Trade occurs → 1s candle emitted at 09:00:19

5-Second Timeframe:

09:00:15-20 → Trades occur → 5s candle emitted at 09:00:20
09:00:20-25 → No trades → NO candle emitted
09:00:25-30 → Trade occurs → 5s candle emitted at 09:00:30

Real-World Coverage Examples:

From live testing with BTC-USDT (3-minute test):

  • Expected 1s candles: 180
  • Actual 1s candles: 53 (29% coverage)
  • Missing periods: 127 seconds with no trading activity

From live testing with ETH-USDT (1-minute test):

  • Expected 1s candles: 60
  • Actual 1s candles: 22 (37% coverage)
  • Missing periods: 38 seconds with no trading activity

3. No Future Leakage Prevention

The aggregation system prevents future leakage by:

  • Only completing candles when time boundaries are definitively crossed
  • Never emitting incomplete candles during real-time processing
  • Waiting for actual trades to trigger bucket completion
  • Using trade timestamps, not system clock times, for bucket assignment

Supported Timeframes

The system supports the following timeframes with precise bucket calculations:

Second-Based Timeframes:

  • 1s: 1-second buckets (00:00, 00:01, 00:02, ...)
  • 5s: 5-second buckets (00:00, 00:05, 00:10, 00:15, ...)
  • 10s: 10-second buckets (00:00, 00:10, 00:20, 00:30, ...)
  • 15s: 15-second buckets (00:00, 00:15, 00:30, 00:45, ...)
  • 30s: 30-second buckets (00:00, 00:30, ...)

Minute-Based Timeframes:

  • 1m: 1-minute buckets aligned to minute boundaries
  • 5m: 5-minute buckets (00:00, 00:05, 00:10, ...)
  • 15m: 15-minute buckets (00:00, 00:15, 00:30, 00:45)
  • 30m: 30-minute buckets (00:00, 00:30)

Hour-Based Timeframes:

  • 1h: 1-hour buckets aligned to hour boundaries
  • 4h: 4-hour buckets (00:00, 04:00, 08:00, 12:00, 16:00, 20:00)
  • 1d: 1-day buckets aligned to midnight UTC

Processing Flow

Real-Time Aggregation Process

  1. Trade arrives from WebSocket with timestamp T
  2. For each configured timeframe:
    • Calculate which time bucket this trade belongs to
    • Get current bucket for this timeframe
    • Check if trade timestamp crosses time boundary
    • If boundary crossed: complete and emit previous bucket (only if it has trades), create new bucket
    • Add trade to current bucket (updates OHLCV)
  3. Only emit completed candles when time boundaries are definitively crossed
  4. Never emit incomplete/future candles during real-time processing

Bucket Management

Time Bucket Creation:

  • Buckets are created only when the first trade arrives for that time period
  • Empty time periods do not create buckets

Bucket Completion:

  • Buckets are completed only when a trade arrives that belongs to a different time bucket
  • Completed buckets are emitted only if they contain at least one trade
  • Empty buckets are discarded silently

Example Timeline:

Time     Trade    1s Bucket Action           5s Bucket Action
-------  -------  -------------------------  ------------------
09:15:23 BUY 0.1  Create bucket 09:15:23     Create bucket 09:15:20
09:15:24 SELL 0.2 Complete 09:15:23 → emit   Add to 09:15:20
09:15:25 -        (no trade = no action)     (no action)
09:15:26 BUY 0.5  Create bucket 09:15:26     Complete 09:15:20 → emit

Handling Sparse Data in Applications

For Trading Algorithms

def handle_sparse_candles(candles: List[OHLCVCandle], timeframe: str) -> List[OHLCVCandle]:
    """
    Handle sparse candle data in trading algorithms.
    """
    if not candles:
        return candles
    
    # Option 1: Use only available data (recommended)
    # Just work with what you have - gaps indicate no trading activity
    return candles
    
    # Option 2: Fill gaps with last known price (if needed)
    filled_candles = []
    last_candle = None
    
    for candle in candles:
        if last_candle:
            # Check for gap
            expected_next = last_candle.end_time + get_timeframe_delta(timeframe)
            if candle.start_time > expected_next:
                # Gap detected - could fill if needed for your strategy
                pass
        
        filled_candles.append(candle)
        last_candle = candle
    
    return filled_candles

For Charting and Visualization

def prepare_chart_data(candles: List[OHLCVCandle], fill_gaps: bool = True) -> List[OHLCVCandle]:
    """
    Prepare sparse candle data for charting applications.
    """
    if not fill_gaps or not candles:
        return candles
    
    # Fill gaps with previous close price for continuous charts
    filled_candles = []
    
    for i, candle in enumerate(candles):
        if i > 0:
            prev_candle = filled_candles[-1]
            gap_periods = calculate_gap_periods(prev_candle.end_time, candle.start_time, timeframe)
            
            # Fill gap periods with flat candles
            for gap_time in gap_periods:
                flat_candle = create_flat_candle(
                    start_time=gap_time,
                    price=prev_candle.close,
                    timeframe=timeframe
                )
                filled_candles.append(flat_candle)
        
        filled_candles.append(candle)
    
    return filled_candles

Database Queries

When querying candle data, be aware of potential gaps:

-- Query that handles sparse data appropriately
SELECT 
    timestamp,
    open, high, low, close, volume,
    trade_count,
    -- Flag periods with actual trading activity
    CASE WHEN trade_count > 0 THEN 'ACTIVE' ELSE 'EMPTY' END as period_type
FROM market_data 
WHERE symbol = 'BTC-USDT' 
  AND timeframe = '1s'
  AND timestamp BETWEEN '2024-01-01 09:00:00' AND '2024-01-01 09:05:00'
ORDER BY timestamp;

-- Query to detect gaps in data
WITH candle_gaps AS (
    SELECT 
        timestamp,
        LAG(timestamp) OVER (ORDER BY timestamp) as prev_timestamp,
        timestamp - LAG(timestamp) OVER (ORDER BY timestamp) as gap_duration
    FROM market_data
    WHERE symbol = 'BTC-USDT' AND timeframe = '1s'
    ORDER BY timestamp
)
SELECT * FROM candle_gaps 
WHERE gap_duration > INTERVAL '1 second';

Performance Characteristics

Storage Efficiency

  • Sparse approach reduces storage by 50-80% compared to complete time series
  • Only meaningful periods are stored in the database
  • Faster queries due to smaller dataset size

Processing Efficiency

  • Lower memory usage during real-time processing
  • Faster aggregation - no need to maintain empty buckets
  • Efficient WebSocket processing - only processes actual market events

Coverage Statistics

Based on real-world testing:

Timeframe Major Pairs Coverage Minor Pairs Coverage
1s 20-40% 5-15%
5s 60-80% 30-50%
10s 75-90% 50-70%
15s 80-95% 60-80%
30s 90-98% 80-95%
1m 95-99% 90-98%

Coverage = Percentage of time periods that actually have candles

Best Practices

For Real-Time Systems

  1. Design algorithms to handle gaps - missing candles are normal
  2. Use last known price for periods without trades
  3. Don't interpolate unless specifically required
  4. Monitor coverage ratios to detect market conditions

For Historical Analysis

  1. Be aware of sparse data when calculating statistics
  2. Consider volume-weighted metrics over time-weighted ones
  3. Use trade_count=0 to identify empty periods when filling gaps
  4. Validate data completeness before running backtests

For Database Storage

  1. Index on (symbol, timeframe, timestamp) for efficient queries
  2. Partition by time periods for large datasets
  3. Consider trade_count > 0 filters for active-only queries
  4. Monitor storage growth - sparse data grows much slower

Configuration

The sparse aggregation behavior is controlled by:

{
  "timeframes": ["1s", "5s", "10s", "15s", "30s", "1m", "5m", "15m", "1h"],
  "auto_save_candles": true,
  "emit_incomplete_candles": false,  // Never emit incomplete candles
  "max_trades_per_candle": 100000
}

Key Setting: emit_incomplete_candles: false ensures only complete, trade-containing candles are emitted.


Note: This sparse approach is the industry standard used by major exchanges and trading platforms. It provides the most accurate representation of actual market activity while maintaining efficiency and preventing data artifacts.