Add common data processing framework for OKX exchange
- Introduced a modular architecture for data processing, including common utilities for validation, transformation, and aggregation. - Implemented `StandardizedTrade`, `OHLCVCandle`, and `TimeframeBucket` classes for unified data handling across exchanges. - Developed `OKXDataProcessor` for OKX-specific data validation and processing, leveraging the new common framework. - Enhanced `OKXCollector` to utilize the common data processing utilities, improving modularity and maintainability. - Updated documentation to reflect the new architecture and provide guidance on the data processing framework. - Created comprehensive tests for the new data processing components to ensure reliability and functionality.
This commit is contained in:
@@ -9,6 +9,11 @@ The documentation is organized into specialized sections for better navigation a
|
||||
### 🏗️ **[Architecture & Design](architecture/)**
|
||||
|
||||
- **[Architecture Overview](architecture/architecture.md)** - High-level system architecture and component design
|
||||
- **[Data Processing Refactor](architecture/data-processing-refactor.md)** - *New modular data processing architecture*
|
||||
- Common utilities shared across all exchanges
|
||||
- Right-aligned timestamp aggregation strategy
|
||||
- Future leakage prevention mechanisms
|
||||
- Exchange-specific component design
|
||||
- **[Crypto Bot PRD](architecture/crypto-bot-prd.md)** - Product Requirements Document for the crypto trading bot platform
|
||||
|
||||
### 🔧 **[Core Components](components/)**
|
||||
@@ -51,6 +56,13 @@ The documentation is organized into specialized sections for better navigation a
|
||||
- API endpoint definitions
|
||||
- Data format specifications
|
||||
|
||||
- **[Aggregation Strategy](reference/aggregation-strategy.md)** - *Comprehensive data aggregation documentation*
|
||||
- Right-aligned timestamp strategy (industry standard)
|
||||
- Future leakage prevention safeguards
|
||||
- Real-time vs historical processing
|
||||
- Database storage patterns
|
||||
- Testing methodology and examples
|
||||
|
||||
## 🎯 **Quick Start**
|
||||
|
||||
1. **New to the platform?** Start with the [Setup Guide](guides/setup.md)
|
||||
|
||||
@@ -1,40 +1,43 @@
|
||||
# Architecture Documentation
|
||||
# Architecture & Design Documentation
|
||||
|
||||
This section contains system architecture and design documentation for the TCP Dashboard platform.
|
||||
This section contains high-level system architecture documentation and design decisions for the TCP Trading Platform.
|
||||
|
||||
## 📋 Contents
|
||||
## Documents
|
||||
|
||||
### System Architecture
|
||||
### [Architecture Overview](architecture.md)
|
||||
Comprehensive overview of the system architecture, including:
|
||||
- Component relationships and data flow
|
||||
- Technology stack and infrastructure decisions
|
||||
- Scalability and performance considerations
|
||||
- Security architecture and best practices
|
||||
|
||||
- **[Architecture Overview](architecture.md)** - *High-level system architecture and component design*
|
||||
- Core system components and interactions
|
||||
- Data flow and processing pipelines
|
||||
- Service architecture and deployment patterns
|
||||
- Technology stack and infrastructure
|
||||
### [Data Processing Refactor](data-processing-refactor.md)
|
||||
Documentation of the major refactoring of the data processing system:
|
||||
- Migration from monolithic to modular architecture
|
||||
- Common utilities framework for all exchanges
|
||||
- Right-aligned timestamp aggregation strategy
|
||||
- Future leakage prevention mechanisms
|
||||
- Exchange-specific component design patterns
|
||||
|
||||
### Product Requirements
|
||||
### [Crypto Bot PRD](crypto-bot-prd.md)
|
||||
Product Requirements Document defining:
|
||||
- Platform objectives and scope
|
||||
- Functional and non-functional requirements
|
||||
- User stories and acceptance criteria
|
||||
- Technical constraints and assumptions
|
||||
|
||||
- **[Crypto Bot PRD](crypto-bot-prd.md)** - *Product Requirements Document for the crypto trading bot platform*
|
||||
- Platform vision and objectives
|
||||
- Feature specifications and requirements
|
||||
- User personas and use cases
|
||||
- Technical requirements and constraints
|
||||
- Implementation roadmap and milestones
|
||||
## Quick Navigation
|
||||
|
||||
## 🏗️ System Overview
|
||||
- **New to the platform?** Start with [Architecture Overview](architecture.md)
|
||||
- **Understanding data processing?** See [Data Processing Refactor](data-processing-refactor.md)
|
||||
- **Product requirements?** Check [Crypto Bot PRD](crypto-bot-prd.md)
|
||||
- **Implementation details?** See [Technical Reference](../reference/)
|
||||
|
||||
The TCP Dashboard follows a modular, microservices-inspired architecture designed for:
|
||||
## Related Documentation
|
||||
|
||||
- **Scalability**: Horizontal scaling of individual components
|
||||
- **Reliability**: Fault tolerance and auto-recovery mechanisms
|
||||
- **Maintainability**: Clear separation of concerns and modular design
|
||||
- **Extensibility**: Easy addition of new exchanges, strategies, and features
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- **[Components Documentation](../components/)** - Technical implementation details
|
||||
- **[Setup Guide](../guides/setup.md)** - System setup and configuration
|
||||
- **[Reference Documentation](../reference/)** - API specifications and technical references
|
||||
- [Technical Reference](../reference/) - Detailed specifications and API documentation
|
||||
- [Core Components](../components/) - Implementation details for system components
|
||||
- [Exchange Integrations](../exchanges/) - Exchange-specific documentation
|
||||
|
||||
---
|
||||
|
||||
|
||||
434
docs/architecture/data-processing-refactor.md
Normal file
434
docs/architecture/data-processing-refactor.md
Normal file
@@ -0,0 +1,434 @@
|
||||
# Refactored Data Processing Architecture
|
||||
|
||||
## Overview
|
||||
|
||||
The data processing system has been significantly refactored to improve reusability, maintainability, and scalability across different exchanges. The key improvement is the extraction of common utilities into a shared framework while keeping exchange-specific components focused and minimal.
|
||||
|
||||
## Architecture Changes
|
||||
|
||||
### Before (Monolithic)
|
||||
```
|
||||
data/exchanges/okx/
|
||||
├── data_processor.py # 1343 lines - everything in one file
|
||||
├── collector.py
|
||||
└── websocket.py
|
||||
```
|
||||
|
||||
### After (Modular)
|
||||
```
|
||||
data/
|
||||
├── common/ # Shared utilities for all exchanges
|
||||
│ ├── __init__.py
|
||||
│ ├── data_types.py # StandardizedTrade, OHLCVCandle, etc.
|
||||
│ ├── aggregation.py # TimeframeBucket, RealTimeCandleProcessor
|
||||
│ ├── transformation.py # BaseDataTransformer, UnifiedDataTransformer
|
||||
│ └── validation.py # BaseDataValidator, common validation
|
||||
└── exchanges/
|
||||
└── okx/
|
||||
├── data_processor.py # ~600 lines - OKX-specific only
|
||||
├── collector.py # Updated to use common utilities
|
||||
└── websocket.py
|
||||
```
|
||||
|
||||
## Key Benefits
|
||||
|
||||
### 1. **Reusability Across Exchanges**
|
||||
- Candle aggregation logic works for any exchange
|
||||
- Standardized data formats enable uniform processing
|
||||
- Base classes provide common patterns for new exchanges
|
||||
|
||||
### 2. **Maintainability**
|
||||
- Smaller, focused files are easier to understand and modify
|
||||
- Common utilities are tested once and reused everywhere
|
||||
- Clear separation of concerns
|
||||
|
||||
### 3. **Extensibility**
|
||||
- Adding new exchanges requires minimal code
|
||||
- New data types and timeframes are automatically supported
|
||||
- Validation and transformation patterns are consistent
|
||||
|
||||
### 4. **Performance**
|
||||
- Optimized aggregation algorithms and memory usage
|
||||
- Efficient candle bucketing algorithms
|
||||
- Lazy evaluation where possible
|
||||
|
||||
### 5. **Testing**
|
||||
- Modular components are easier to test independently
|
||||
|
||||
## Time Aggregation Strategy
|
||||
|
||||
### Right-Aligned Timestamps (Industry Standard)
|
||||
|
||||
The system uses **RIGHT-ALIGNED timestamps** following industry standards from major exchanges (Binance, OKX, Coinbase):
|
||||
|
||||
- **Candle timestamp = end time of the interval (close time)**
|
||||
- 5-minute candle with timestamp `09:05:00` represents data from `09:00:01` to `09:05:00`
|
||||
- 1-minute candle with timestamp `14:32:00` represents data from `14:31:01` to `14:32:00`
|
||||
- This aligns with how exchanges report historical data
|
||||
|
||||
### Aggregation Process (No Future Leakage)
|
||||
|
||||
```python
|
||||
def process_trade_realtime(trade: StandardizedTrade, timeframe: str):
|
||||
"""
|
||||
Real-time aggregation with strict future leakage prevention
|
||||
|
||||
CRITICAL: Only emit completed candles, never incomplete ones
|
||||
"""
|
||||
|
||||
# 1. Calculate which time bucket this trade belongs to
|
||||
trade_bucket_start = get_bucket_start_time(trade.timestamp, timeframe)
|
||||
|
||||
# 2. Check if current bucket exists and matches
|
||||
current_bucket = current_buckets.get(timeframe)
|
||||
|
||||
# 3. Handle time boundary crossing
|
||||
if current_bucket is None:
|
||||
# First bucket for this timeframe
|
||||
current_bucket = create_bucket(trade_bucket_start, timeframe)
|
||||
elif current_bucket.start_time != trade_bucket_start:
|
||||
# Time boundary crossed - complete previous bucket FIRST
|
||||
if current_bucket.has_trades():
|
||||
completed_candle = current_bucket.to_candle(is_complete=True)
|
||||
emit_candle(completed_candle) # Store in market_data table
|
||||
|
||||
# Create new bucket for current time period
|
||||
current_bucket = create_bucket(trade_bucket_start, timeframe)
|
||||
|
||||
# 4. Add trade to current bucket
|
||||
current_bucket.add_trade(trade)
|
||||
|
||||
# 5. Return only completed candles (never incomplete/future data)
|
||||
return completed_candles # Empty list unless boundary crossed
|
||||
```
|
||||
|
||||
### Time Bucket Calculation Examples
|
||||
|
||||
```python
|
||||
# 5-minute timeframes (00:00, 00:05, 00:10, 00:15, etc.)
|
||||
trade_time = "09:03:45" -> bucket_start = "09:00:00", bucket_end = "09:05:00"
|
||||
trade_time = "09:07:23" -> bucket_start = "09:05:00", bucket_end = "09:10:00"
|
||||
trade_time = "09:05:00" -> bucket_start = "09:05:00", bucket_end = "09:10:00"
|
||||
|
||||
# 1-hour timeframes (align to hour boundaries)
|
||||
trade_time = "14:35:22" -> bucket_start = "14:00:00", bucket_end = "15:00:00"
|
||||
trade_time = "15:00:00" -> bucket_start = "15:00:00", bucket_end = "16:00:00"
|
||||
|
||||
# 4-hour timeframes (00:00, 04:00, 08:00, 12:00, 16:00, 20:00)
|
||||
trade_time = "13:45:12" -> bucket_start = "12:00:00", bucket_end = "16:00:00"
|
||||
trade_time = "16:00:01" -> bucket_start = "16:00:00", bucket_end = "20:00:00"
|
||||
```
|
||||
|
||||
### Future Leakage Prevention
|
||||
|
||||
**CRITICAL SAFEGUARDS:**
|
||||
|
||||
1. **Boundary Crossing Detection**: Only complete candles when trade timestamp definitively crosses time boundary
|
||||
2. **No Premature Completion**: Never emit incomplete candles during real-time processing
|
||||
3. **Strict Time Validation**: Trades only added to buckets if `start_time <= trade.timestamp < end_time`
|
||||
4. **Historical Consistency**: Same logic for real-time and historical processing
|
||||
|
||||
```python
|
||||
# CORRECT: Only complete candle when boundary is crossed
|
||||
if current_bucket.start_time != trade_bucket_start:
|
||||
# Time boundary definitely crossed - safe to complete
|
||||
completed_candle = current_bucket.to_candle(is_complete=True)
|
||||
emit_to_storage(completed_candle)
|
||||
|
||||
# INCORRECT: Would cause future leakage
|
||||
if some_timer_expires():
|
||||
# Never complete based on timers or external events
|
||||
completed_candle = current_bucket.to_candle(is_complete=True) # WRONG!
|
||||
```
|
||||
|
||||
### Data Storage Flow
|
||||
|
||||
```
|
||||
WebSocket Trade Data → Validation → Transformation → Aggregation → Storage
|
||||
| | |
|
||||
↓ ↓ ↓
|
||||
Raw individual trades Completed OHLCV Incomplete OHLCV
|
||||
| candles (storage) (monitoring only)
|
||||
↓ |
|
||||
raw_trades table market_data table
|
||||
(debugging/compliance) (trading decisions)
|
||||
```
|
||||
|
||||
**Storage Rules:**
|
||||
- **Raw trades** → `raw_trades` table (every individual trade/orderbook/ticker)
|
||||
- **Completed candles** → `market_data` table (only when timeframe boundary crossed)
|
||||
- **Incomplete candles** → Memory only (never stored, used for monitoring)
|
||||
|
||||
### Aggregation Logic Implementation
|
||||
|
||||
```python
|
||||
def aggregate_to_timeframe(trades: List[StandardizedTrade], timeframe: str) -> List[OHLCVCandle]:
|
||||
"""
|
||||
Aggregate trades to specified timeframe with right-aligned timestamps
|
||||
"""
|
||||
# Group trades by time intervals
|
||||
buckets = {}
|
||||
completed_candles = []
|
||||
|
||||
for trade in sorted(trades, key=lambda t: t.timestamp):
|
||||
# Calculate bucket start time (left boundary)
|
||||
bucket_start = get_bucket_start_time(trade.timestamp, timeframe)
|
||||
|
||||
# Get or create bucket
|
||||
if bucket_start not in buckets:
|
||||
buckets[bucket_start] = TimeframeBucket(timeframe, bucket_start)
|
||||
|
||||
# Add trade to bucket
|
||||
buckets[bucket_start].add_trade(trade)
|
||||
|
||||
# Convert all buckets to candles with right-aligned timestamps
|
||||
for bucket in buckets.values():
|
||||
candle = bucket.to_candle(is_complete=True)
|
||||
# candle.timestamp = bucket.end_time (right-aligned)
|
||||
completed_candles.append(candle)
|
||||
|
||||
return completed_candles
|
||||
```
|
||||
|
||||
## Common Components
|
||||
|
||||
### Data Types (`data/common/data_types.py`)
|
||||
|
||||
**StandardizedTrade**: Universal trade format
|
||||
```python
|
||||
@dataclass
|
||||
class StandardizedTrade:
|
||||
symbol: str
|
||||
trade_id: str
|
||||
price: Decimal
|
||||
size: Decimal
|
||||
side: str # 'buy' or 'sell'
|
||||
timestamp: datetime
|
||||
exchange: str = "okx"
|
||||
raw_data: Optional[Dict[str, Any]] = None
|
||||
```
|
||||
|
||||
**OHLCVCandle**: Universal candle format
|
||||
```python
|
||||
@dataclass
|
||||
class OHLCVCandle:
|
||||
symbol: str
|
||||
timeframe: str
|
||||
start_time: datetime
|
||||
end_time: datetime
|
||||
open: Decimal
|
||||
high: Decimal
|
||||
low: Decimal
|
||||
close: Decimal
|
||||
volume: Decimal
|
||||
trade_count: int
|
||||
is_complete: bool = False
|
||||
```
|
||||
|
||||
### Aggregation (`data/common/aggregation.py`)
|
||||
|
||||
**RealTimeCandleProcessor**: Handles real-time candle building for any exchange
|
||||
- Processes trades immediately as they arrive
|
||||
- Supports multiple timeframes simultaneously
|
||||
- Emits completed candles when time boundaries cross
|
||||
- Thread-safe and memory efficient
|
||||
|
||||
**BatchCandleProcessor**: Handles historical data processing
|
||||
- Processes large batches of trades efficiently
|
||||
- Memory-optimized for backfill scenarios
|
||||
- Same candle output format as real-time processor
|
||||
|
||||
### Transformation (`data/common/transformation.py`)
|
||||
|
||||
**BaseDataTransformer**: Abstract base class for exchange transformers
|
||||
- Common transformation utilities (timestamp conversion, decimal handling)
|
||||
- Abstract methods for exchange-specific transformations
|
||||
- Consistent error handling patterns
|
||||
|
||||
**UnifiedDataTransformer**: Unified interface for all transformation scenarios
|
||||
- Works with real-time, historical, and backfill data
|
||||
- Handles batch processing efficiently
|
||||
- Integrates with aggregation components
|
||||
|
||||
### Validation (`data/common/validation.py`)
|
||||
|
||||
**BaseDataValidator**: Common validation patterns
|
||||
- Price, size, volume validation
|
||||
- Timestamp validation
|
||||
- Orderbook validation
|
||||
- Generic symbol validation
|
||||
|
||||
## Exchange-Specific Components
|
||||
|
||||
### OKX Data Processor (`data/exchanges/okx/data_processor.py`)
|
||||
|
||||
Now focused only on OKX-specific functionality:
|
||||
|
||||
**OKXDataValidator**: Extends BaseDataValidator
|
||||
- OKX-specific symbol patterns (BTC-USDT format)
|
||||
- OKX message structure validation
|
||||
- OKX field mappings and requirements
|
||||
|
||||
**OKXDataTransformer**: Extends BaseDataTransformer
|
||||
- OKX WebSocket format transformation
|
||||
- OKX-specific field extraction
|
||||
- Integration with common utilities
|
||||
|
||||
**OKXDataProcessor**: Main processor using common framework
|
||||
- Uses common validation and transformation utilities
|
||||
- Significantly simplified (~600 lines vs 1343 lines)
|
||||
- Better separation of concerns
|
||||
|
||||
### Updated OKX Collector (`data/exchanges/okx/collector.py`)
|
||||
|
||||
**Key improvements:**
|
||||
- Uses OKXDataProcessor with common utilities
|
||||
- Automatic candle generation for trades
|
||||
- Simplified message processing
|
||||
- Better error handling and statistics
|
||||
- Callback system for real-time data
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Creating a New Exchange
|
||||
|
||||
To add support for a new exchange (e.g., Binance):
|
||||
|
||||
1. **Create exchange-specific validator:**
|
||||
```python
|
||||
class BinanceDataValidator(BaseDataValidator):
|
||||
def __init__(self, component_name="binance_validator"):
|
||||
super().__init__("binance", component_name)
|
||||
self._symbol_pattern = re.compile(r'^[A-Z]+[A-Z]+$') # BTCUSDT format
|
||||
|
||||
def validate_symbol_format(self, symbol: str) -> ValidationResult:
|
||||
# Binance-specific symbol validation
|
||||
pass
|
||||
```
|
||||
|
||||
2. **Create exchange-specific transformer:**
|
||||
```python
|
||||
class BinanceDataTransformer(BaseDataTransformer):
|
||||
def transform_trade_data(self, raw_data: Dict[str, Any], symbol: str) -> Optional[StandardizedTrade]:
|
||||
return create_standardized_trade(
|
||||
symbol=raw_data['s'], # Binance field mapping
|
||||
trade_id=raw_data['t'],
|
||||
price=raw_data['p'],
|
||||
size=raw_data['q'],
|
||||
side='buy' if raw_data['m'] else 'sell',
|
||||
timestamp=raw_data['T'],
|
||||
exchange="binance",
|
||||
raw_data=raw_data
|
||||
)
|
||||
```
|
||||
|
||||
3. **Automatic candle support:**
|
||||
```python
|
||||
# Real-time candles work automatically
|
||||
processor = RealTimeCandleProcessor(symbol, "binance", config)
|
||||
for trade in trades:
|
||||
completed_candles = processor.process_trade(trade)
|
||||
```
|
||||
|
||||
### Using Common Utilities
|
||||
|
||||
**Data transformation:**
|
||||
```python
|
||||
# Works with any exchange
|
||||
transformer = UnifiedDataTransformer(exchange_transformer)
|
||||
standardized_trade = transformer.transform_trade_data(raw_trade, symbol)
|
||||
|
||||
# Batch processing
|
||||
candles = transformer.process_trades_to_candles(
|
||||
trades_iterator,
|
||||
['1m', '5m', '1h'],
|
||||
symbol
|
||||
)
|
||||
```
|
||||
|
||||
**Real-time candle processing:**
|
||||
```python
|
||||
# Same code works for any exchange
|
||||
candle_processor = RealTimeCandleProcessor(symbol, exchange, config)
|
||||
candle_processor.add_candle_callback(my_candle_handler)
|
||||
|
||||
for trade in real_time_trades:
|
||||
completed_candles = candle_processor.process_trade(trade)
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
The refactored architecture includes comprehensive testing:
|
||||
|
||||
**Test script:** `scripts/test_refactored_okx.py`
|
||||
- Tests common utilities
|
||||
- Tests OKX-specific components
|
||||
- Tests integration between components
|
||||
- Performance and memory testing
|
||||
|
||||
**Run tests:**
|
||||
```bash
|
||||
python scripts/test_refactored_okx.py
|
||||
```
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### For Existing OKX Code
|
||||
|
||||
1. **Update imports:**
|
||||
```python
|
||||
# Old
|
||||
from data.exchanges.okx.data_processor import StandardizedTrade, OHLCVCandle
|
||||
|
||||
# New
|
||||
from data.common import StandardizedTrade, OHLCVCandle
|
||||
```
|
||||
|
||||
2. **Use new processor:**
|
||||
```python
|
||||
# Old
|
||||
from data.exchanges.okx.data_processor import OKXDataProcessor, UnifiedDataTransformer
|
||||
|
||||
# New
|
||||
from data.exchanges.okx.data_processor import OKXDataProcessor # Uses common utilities internally
|
||||
```
|
||||
|
||||
3. **Existing functionality preserved:**
|
||||
- All existing APIs remain the same
|
||||
- Performance improved due to optimizations
|
||||
- More features available (better candle processing, validation)
|
||||
|
||||
### For New Exchange Development
|
||||
|
||||
1. **Start with common base classes**
|
||||
2. **Implement only exchange-specific validation and transformation**
|
||||
3. **Get candle processing, batch processing, and validation for free**
|
||||
4. **Focus on exchange API integration rather than data processing logic**
|
||||
|
||||
## Performance Improvements
|
||||
|
||||
**Memory Usage:**
|
||||
- Streaming processing reduces memory footprint
|
||||
- Efficient candle bucketing algorithms
|
||||
- Lazy evaluation where possible
|
||||
|
||||
**Processing Speed:**
|
||||
- Optimized validation with early returns
|
||||
- Batch processing capabilities
|
||||
- Parallel processing support
|
||||
|
||||
**Maintainability:**
|
||||
- Smaller, focused components
|
||||
- Better test coverage
|
||||
- Clear error handling and logging
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
**Planned Features:**
|
||||
1. **Exchange Factory Pattern** - Automatically create collectors for any exchange
|
||||
2. **Plugin System** - Load exchange implementations dynamically
|
||||
3. **Configuration-Driven Development** - Define new exchanges via config files
|
||||
4. **Enhanced Analytics** - Built-in technical indicators and statistics
|
||||
5. **Multi-Exchange Arbitrage** - Cross-exchange data synchronization
|
||||
|
||||
This refactored architecture provides a solid foundation for scalable, maintainable cryptocurrency data processing across any number of exchanges while keeping exchange-specific code minimal and focused.
|
||||
@@ -13,6 +13,13 @@ This section contains technical specifications, API references, and detailed doc
|
||||
- Data format specifications
|
||||
- Integration requirements
|
||||
|
||||
- **[Aggregation Strategy](aggregation-strategy.md)** - *Comprehensive data aggregation documentation*
|
||||
- Right-aligned timestamp strategy (industry standard)
|
||||
- Future leakage prevention safeguards
|
||||
- Real-time vs historical processing
|
||||
- Database storage patterns
|
||||
- Testing methodology and examples
|
||||
|
||||
### API References
|
||||
|
||||
#### Data Collection APIs
|
||||
|
||||
341
docs/reference/aggregation-strategy.md
Normal file
341
docs/reference/aggregation-strategy.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# Data Aggregation Strategy
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the comprehensive data aggregation strategy used in the TCP Trading Platform for converting real-time trade data into OHLCV (Open, High, Low, Close, Volume) candles across multiple timeframes.
|
||||
|
||||
## Core Principles
|
||||
|
||||
### 1. Right-Aligned Timestamps (Industry Standard)
|
||||
|
||||
The system follows the **RIGHT-ALIGNED timestamp** convention used by major exchanges:
|
||||
|
||||
- **Candle timestamp = end time of the interval (close time)**
|
||||
- This represents when the candle period **closes**, not when it opens
|
||||
- Aligns with Binance, OKX, Coinbase, and other major exchanges
|
||||
- Ensures consistency with historical data APIs
|
||||
|
||||
**Examples:**
|
||||
```
|
||||
5-minute candle with timestamp 09:05:00:
|
||||
├─ Represents data from 09:00:01 to 09:05:00
|
||||
├─ Includes all trades in the interval [09:00:01, 09:05:00]
|
||||
└─ Candle "closes" at 09:05:00
|
||||
|
||||
1-hour candle with timestamp 14:00:00:
|
||||
├─ Represents data from 13:00:01 to 14:00:00
|
||||
├─ Includes all trades in the interval [13:00:01, 14:00:00]
|
||||
└─ Candle "closes" at 14:00:00
|
||||
```
|
||||
|
||||
### 2. Future Leakage Prevention
|
||||
|
||||
**CRITICAL**: The system implements strict safeguards to prevent future leakage:
|
||||
|
||||
- **Only emit completed candles** when time boundary is definitively crossed
|
||||
- **Never emit incomplete candles** during real-time processing
|
||||
- **No timer-based completion** - only trade timestamp-driven
|
||||
- **Strict time validation** for all trade additions
|
||||
|
||||
## Aggregation Process
|
||||
|
||||
### Real-Time Processing Flow
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Trade Arrives from WebSocket] --> B[Extract Timestamp T]
|
||||
B --> C[For Each Timeframe]
|
||||
C --> D[Calculate Bucket Start Time]
|
||||
D --> E{Bucket Exists?}
|
||||
E -->|No| F[Create New Bucket]
|
||||
E -->|Yes| G{Same Time Period?}
|
||||
G -->|Yes| H[Add Trade to Current Bucket]
|
||||
G -->|No| I[Complete Previous Bucket]
|
||||
I --> J[Emit Completed Candle]
|
||||
J --> K[Store in market_data Table]
|
||||
K --> F
|
||||
F --> H
|
||||
H --> L[Update OHLCV Values]
|
||||
L --> M[Continue Processing]
|
||||
```
|
||||
|
||||
### Time Bucket Calculation
|
||||
|
||||
The system calculates which time bucket a trade belongs to based on its timestamp:
|
||||
|
||||
```python
|
||||
def get_bucket_start_time(timestamp: datetime, timeframe: str) -> datetime:
|
||||
"""
|
||||
Calculate the start time of the bucket for a given trade timestamp.
|
||||
|
||||
This determines the LEFT boundary of the time interval.
|
||||
The RIGHT boundary (end_time) becomes the candle timestamp.
|
||||
"""
|
||||
# Normalize to remove seconds/microseconds
|
||||
dt = timestamp.replace(second=0, microsecond=0)
|
||||
|
||||
if timeframe == '1m':
|
||||
# 1-minute: align to minute boundaries
|
||||
return dt
|
||||
elif timeframe == '5m':
|
||||
# 5-minute: 00:00, 00:05, 00:10, 00:15, etc.
|
||||
return dt.replace(minute=(dt.minute // 5) * 5)
|
||||
elif timeframe == '15m':
|
||||
# 15-minute: 00:00, 00:15, 00:30, 00:45
|
||||
return dt.replace(minute=(dt.minute // 15) * 15)
|
||||
elif timeframe == '1h':
|
||||
# 1-hour: align to hour boundaries
|
||||
return dt.replace(minute=0)
|
||||
elif timeframe == '4h':
|
||||
# 4-hour: 00:00, 04:00, 08:00, 12:00, 16:00, 20:00
|
||||
return dt.replace(minute=0, hour=(dt.hour // 4) * 4)
|
||||
elif timeframe == '1d':
|
||||
# 1-day: align to midnight UTC
|
||||
return dt.replace(minute=0, hour=0)
|
||||
```
|
||||
|
||||
### Detailed Examples
|
||||
|
||||
#### 5-Minute Timeframe Processing
|
||||
|
||||
```
|
||||
Current time: 09:03:45
|
||||
Trade arrives at: 09:03:45
|
||||
|
||||
Step 1: Calculate bucket start time
|
||||
├─ timeframe = '5m'
|
||||
├─ minute = 3
|
||||
├─ bucket_minute = (3 // 5) * 5 = 0
|
||||
└─ bucket_start = 09:00:00
|
||||
|
||||
Step 2: Bucket boundaries
|
||||
├─ start_time = 09:00:00 (inclusive)
|
||||
├─ end_time = 09:05:00 (exclusive)
|
||||
└─ candle_timestamp = 09:05:00 (right-aligned)
|
||||
|
||||
Step 3: Trade validation
|
||||
├─ 09:00:00 <= 09:03:45 < 09:05:00 ✓
|
||||
└─ Trade belongs to this bucket
|
||||
|
||||
Step 4: OHLCV update
|
||||
├─ If first trade: set open price
|
||||
├─ Update high/low prices
|
||||
├─ Set close price (latest trade)
|
||||
├─ Add to volume
|
||||
└─ Increment trade count
|
||||
```
|
||||
|
||||
#### Boundary Crossing Example
|
||||
|
||||
```
|
||||
Scenario: 5-minute timeframe, transition from 09:04:59 to 09:05:00
|
||||
|
||||
Trade 1: timestamp = 09:04:59
|
||||
├─ bucket_start = 09:00:00
|
||||
├─ Belongs to current bucket [09:00:00 - 09:05:00)
|
||||
└─ Add to current bucket
|
||||
|
||||
Trade 2: timestamp = 09:05:00
|
||||
├─ bucket_start = 09:05:00
|
||||
├─ Different from current bucket (09:00:00)
|
||||
├─ TIME BOUNDARY CROSSED!
|
||||
├─ Complete previous bucket → candle with timestamp 09:05:00
|
||||
├─ Store completed candle in market_data table
|
||||
├─ Create new bucket [09:05:00 - 09:10:00)
|
||||
└─ Add Trade 2 to new bucket
|
||||
```
|
||||
|
||||
## Data Storage Strategy
|
||||
|
||||
### Storage Tables
|
||||
|
||||
#### 1. `raw_trades` Table
|
||||
**Purpose**: Store every individual piece of data as received
|
||||
**Data**: Trades, orderbook updates, tickers
|
||||
**Usage**: Debugging, compliance, detailed analysis
|
||||
|
||||
```sql
|
||||
CREATE TABLE raw_trades (
|
||||
id SERIAL PRIMARY KEY,
|
||||
exchange VARCHAR(50) NOT NULL,
|
||||
symbol VARCHAR(20) NOT NULL,
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
data_type VARCHAR(20) NOT NULL, -- 'trade', 'orderbook', 'ticker'
|
||||
raw_data JSONB NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
#### 2. `market_data` Table
|
||||
**Purpose**: Store completed OHLCV candles for trading decisions
|
||||
**Data**: Only completed candles with right-aligned timestamps
|
||||
**Usage**: Bot strategies, backtesting, analysis
|
||||
|
||||
```sql
|
||||
CREATE TABLE market_data (
|
||||
id SERIAL PRIMARY KEY,
|
||||
exchange VARCHAR(50) NOT NULL,
|
||||
symbol VARCHAR(20) NOT NULL,
|
||||
timeframe VARCHAR(5) NOT NULL,
|
||||
timestamp TIMESTAMPTZ NOT NULL, -- RIGHT-ALIGNED (candle close time)
|
||||
open DECIMAL(18,8) NOT NULL,
|
||||
high DECIMAL(18,8) NOT NULL,
|
||||
low DECIMAL(18,8) NOT NULL,
|
||||
close DECIMAL(18,8) NOT NULL,
|
||||
volume DECIMAL(18,8) NOT NULL,
|
||||
trades_count INTEGER
|
||||
);
|
||||
```
|
||||
|
||||
### Storage Flow
|
||||
|
||||
```
|
||||
WebSocket Message
|
||||
├─ Contains multiple trades
|
||||
├─ Each trade stored in raw_trades table
|
||||
└─ Each trade processed through aggregation
|
||||
|
||||
Aggregation Engine
|
||||
├─ Groups trades by timeframe buckets
|
||||
├─ Updates OHLCV values incrementally
|
||||
├─ Detects time boundary crossings
|
||||
└─ Emits completed candles only
|
||||
|
||||
Completed Candles
|
||||
├─ Stored in market_data table
|
||||
├─ Timestamp = bucket end time (right-aligned)
|
||||
├─ is_complete = true
|
||||
└─ Available for trading strategies
|
||||
```
|
||||
|
||||
## Future Leakage Prevention
|
||||
|
||||
### Critical Safeguards
|
||||
|
||||
#### 1. Boundary Crossing Detection
|
||||
```python
|
||||
# CORRECT: Only complete when boundary definitively crossed
|
||||
if current_bucket.start_time != trade_bucket_start:
|
||||
# Time boundary crossed - safe to complete previous bucket
|
||||
if current_bucket.trade_count > 0:
|
||||
completed_candle = current_bucket.to_candle(is_complete=True)
|
||||
emit_candle(completed_candle)
|
||||
```
|
||||
|
||||
#### 2. No Premature Completion
|
||||
```python
|
||||
# WRONG: Never complete based on timers or external events
|
||||
if time.now() > bucket.end_time:
|
||||
completed_candle = bucket.to_candle(is_complete=True) # FUTURE LEAKAGE!
|
||||
|
||||
# WRONG: Never complete incomplete buckets during real-time
|
||||
if some_condition:
|
||||
completed_candle = current_bucket.to_candle(is_complete=True) # WRONG!
|
||||
```
|
||||
|
||||
#### 3. Strict Time Validation
|
||||
```python
|
||||
def add_trade(self, trade: StandardizedTrade) -> bool:
|
||||
# Only accept trades within bucket boundaries
|
||||
if not (self.start_time <= trade.timestamp < self.end_time):
|
||||
return False # Reject trades outside time range
|
||||
|
||||
# Safe to add trade
|
||||
self.update_ohlcv(trade)
|
||||
return True
|
||||
```
|
||||
|
||||
#### 4. Historical Consistency
|
||||
```python
|
||||
# Same logic for real-time and historical processing
|
||||
def process_trade(trade):
|
||||
"""Used for both real-time WebSocket and historical API data"""
|
||||
return self._process_trade_for_timeframe(trade, timeframe)
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Validation Tests
|
||||
|
||||
1. **Timestamp Alignment Tests**
|
||||
- Verify candle timestamps are right-aligned
|
||||
- Check bucket boundary calculations
|
||||
- Validate timeframe-specific alignment
|
||||
|
||||
2. **Future Leakage Tests**
|
||||
- Ensure no incomplete candles are emitted
|
||||
- Verify boundary crossing detection
|
||||
- Test with edge case timestamps
|
||||
|
||||
3. **Data Integrity Tests**
|
||||
- OHLCV calculation accuracy
|
||||
- Volume aggregation correctness
|
||||
- Trade count validation
|
||||
|
||||
### Test Examples
|
||||
|
||||
```python
|
||||
def test_right_aligned_timestamps():
|
||||
"""Test that candle timestamps are right-aligned"""
|
||||
trades = [
|
||||
create_trade("09:01:30", price=100),
|
||||
create_trade("09:03:45", price=101),
|
||||
create_trade("09:05:00", price=102), # Boundary crossing
|
||||
]
|
||||
|
||||
candles = process_trades(trades, timeframe='5m')
|
||||
|
||||
# First candle should have timestamp 09:05:00 (right-aligned)
|
||||
assert candles[0].timestamp == datetime(hour=9, minute=5)
|
||||
assert candles[0].start_time == datetime(hour=9, minute=0)
|
||||
assert candles[0].end_time == datetime(hour=9, minute=5)
|
||||
|
||||
def test_no_future_leakage():
|
||||
"""Test that incomplete candles are never emitted"""
|
||||
processor = RealTimeCandleProcessor(symbol='BTC-USDT', timeframes=['5m'])
|
||||
|
||||
# Add trades within same bucket
|
||||
trade1 = create_trade("09:01:00", price=100)
|
||||
trade2 = create_trade("09:03:00", price=101)
|
||||
|
||||
# Should return empty list (no completed candles)
|
||||
completed = processor.process_trade(trade1)
|
||||
assert len(completed) == 0
|
||||
|
||||
completed = processor.process_trade(trade2)
|
||||
assert len(completed) == 0
|
||||
|
||||
# Only when boundary crossed should candle be emitted
|
||||
trade3 = create_trade("09:05:00", price=102)
|
||||
completed = processor.process_trade(trade3)
|
||||
assert len(completed) == 1 # Previous bucket completed
|
||||
assert completed[0].is_complete == True
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Memory Management
|
||||
- Keep only current buckets in memory
|
||||
- Clear completed buckets immediately after emission
|
||||
- Limit maximum number of active timeframes
|
||||
|
||||
### Database Optimization
|
||||
- Batch insert completed candles
|
||||
- Use prepared statements for frequent inserts
|
||||
- Index on (symbol, timeframe, timestamp) for queries
|
||||
|
||||
### Processing Efficiency
|
||||
- Process all timeframes in single trade iteration
|
||||
- Use efficient bucket start time calculations
|
||||
- Minimize object creation in hot paths
|
||||
|
||||
## Conclusion
|
||||
|
||||
This aggregation strategy ensures:
|
||||
|
||||
✅ **Industry Standard Compliance**: Right-aligned timestamps matching major exchanges
|
||||
✅ **Future Leakage Prevention**: Strict boundary detection and validation
|
||||
✅ **Data Integrity**: Accurate OHLCV calculations and storage
|
||||
✅ **Performance**: Efficient real-time and batch processing
|
||||
✅ **Consistency**: Same logic for real-time and historical data
|
||||
|
||||
The implementation provides a robust foundation for building trading strategies with confidence in data accuracy and timing.
|
||||
Reference in New Issue
Block a user