# OKX Data Collector Documentation ## Overview The OKX Data Collector provides real-time market data collection from OKX exchange using WebSocket API. It's built on the modular exchange architecture and provides robust connection management, automatic reconnection, health monitoring, and comprehensive data processing. ## Features ### 🎯 **OKX-Specific Features** - **Real-time Data**: Live trades, orderbook, and ticker data - **Single Pair Focus**: Each collector handles one trading pair for better isolation - **Ping/Pong Management**: OKX-specific keepalive mechanism with proper format - **Raw Data Storage**: Optional storage of raw OKX messages for debugging - **Connection Resilience**: Robust reconnection logic for OKX WebSocket ### 📊 **Supported Data Types** - **Trades**: Real-time trade executions (`trades` channel) - **Orderbook**: 5-level order book depth (`books5` channel) - **Ticker**: 24h ticker statistics (`tickers` channel) - **Candles**: Real-time OHLCV aggregation with configurable timeframes - Supports any timeframe from 1s upwards - Common timeframes: 1s, 5s, 1m, 5m, 15m, 1h, 4h, 1d - Custom timeframes can be configured in data_collection.json ### 🔧 **Configuration Options** - Auto-restart on failures - Health check intervals - Raw data storage toggle - Custom ping/pong timing - Reconnection attempts configuration - Flexible timeframe configuration (1s, 5s, 1m, 5m, 15m, 1h, etc.) - Configurable candle aggregation settings ## Quick Start ### 1. Using Factory Pattern (Recommended) ```python import asyncio from data.exchanges import create_okx_collector from data.base_collector import DataType async def main(): # Create OKX collector using convenience function collector = create_okx_collector( symbol='BTC-USDT', data_types=[DataType.TRADE, DataType.ORDERBOOK], auto_restart=True, health_check_interval=30.0, store_raw_data=True ) # Add data callbacks def on_trade(data_point): trade = data_point.data print(f"Trade: {trade['side']} {trade['sz']} @ {trade['px']} (ID: {trade['tradeId']})") def on_orderbook(data_point): book = data_point.data if book.get('bids') and book.get('asks'): best_bid = book['bids'][0] best_ask = book['asks'][0] print(f"Orderbook: Bid {best_bid[0]}@{best_bid[1]} Ask {best_ask[0]}@{best_ask[1]}") collector.add_data_callback(DataType.TRADE, on_trade) collector.add_data_callback(DataType.ORDERBOOK, on_orderbook) # Start collector await collector.start() # Run for 60 seconds await asyncio.sleep(60) # Stop gracefully await collector.stop() asyncio.run(main()) ``` ### 2. Direct OKX Collector Usage ```python import asyncio from data.exchanges.okx import OKXCollector from data.base_collector import DataType async def main(): # Create collector directly collector = OKXCollector( symbol='ETH-USDT', data_types=[DataType.TRADE, DataType.ORDERBOOK], component_name='eth_collector', auto_restart=True, health_check_interval=30.0, store_raw_data=True ) # Add callbacks def on_data(data_point): print(f"{data_point.data_type.value}: {data_point.symbol} - {data_point.timestamp}") collector.add_data_callback(DataType.TRADE, on_data) collector.add_data_callback(DataType.ORDERBOOK, on_data) # Start and monitor await collector.start() # Monitor status for i in range(12): # 60 seconds total await asyncio.sleep(5) status = collector.get_status() print(f"Status: {status['status']} - Messages: {status.get('messages_processed', 0)}") await collector.stop() asyncio.run(main()) ``` ### 3. Multiple OKX Collectors with Manager ```python import asyncio from data.collector_manager import CollectorManager from data.exchanges import create_okx_collector from data.base_collector import DataType async def main(): # Create manager manager = CollectorManager( manager_name="okx_trading_system", global_health_check_interval=30.0 ) # Create multiple OKX collectors symbols = ['BTC-USDT', 'ETH-USDT', 'SOL-USDT'] for symbol in symbols: collector = create_okx_collector( symbol=symbol, data_types=[DataType.TRADE, DataType.ORDERBOOK], auto_restart=True ) manager.add_collector(collector) # Start manager await manager.start() # Monitor all collectors while True: status = manager.get_status() stats = status.get('statistics', {}) print(f"=== OKX Collectors Status ===") print(f"Running: {stats.get('running_collectors', 0)}") print(f"Failed: {stats.get('failed_collectors', 0)}") print(f"Total messages: {stats.get('total_messages', 0)}") # Individual collector status for collector_name in manager.list_collectors(): collector_status = manager.get_collector_status(collector_name) if collector_status: info = collector_status.get('status', {}) print(f" {collector_name}: {info.get('status')} - " f"Messages: {info.get('messages_processed', 0)}") await asyncio.sleep(15) asyncio.run(main()) ``` ### 3. Multi-Timeframe Candle Processing ```python import asyncio from data.exchanges.okx import OKXCollector from data.base_collector import DataType from data.common import CandleProcessingConfig async def main(): # Configure multi-timeframe candle processing with 1s support candle_config = CandleProcessingConfig( timeframes=['1s', '5s', '1m', '5m', '15m', '1h'], # Including 1s timeframe auto_save_candles=True, emit_incomplete_candles=False ) # Create collector with candle processing collector = OKXCollector( symbol='BTC-USDT', data_types=[DataType.TRADE], # Trades needed for candle aggregation timeframes=['1s', '5s', '1m', '5m', '15m', '1h'], # Specify desired timeframes candle_config=candle_config, auto_restart=True, store_raw_data=False # Disable raw storage for production ) # Add candle callback def on_candle_completed(candle): print(f"Completed {candle.timeframe} candle: " f"OHLCV=({candle.open},{candle.high},{candle.low},{candle.close},{candle.volume}) " f"at {candle.end_time}") collector.add_candle_callback(on_candle_completed) # Start collector await collector.start() # Monitor real-time candle generation await asyncio.sleep(300) # 5 minutes await collector.stop() asyncio.run(main()) ``` ## Configuration ### 1. JSON Configuration File The system uses `config/okx_config.json` for configuration: ```json { "exchange": "okx", "connection": { "public_ws_url": "wss://ws.okx.com:8443/ws/v5/public", "private_ws_url": "wss://ws.okx.com:8443/ws/v5/private", "ping_interval": 25.0, "pong_timeout": 10.0, "max_reconnect_attempts": 5, "reconnect_delay": 5.0 }, "data_collection": { "store_raw_data": true, "health_check_interval": 30.0, "auto_restart": true, "buffer_size": 1000 }, "factory": { "use_factory_pattern": true, "default_data_types": ["trade", "orderbook"], "batch_create": true }, "trading_pairs": [ { "symbol": "BTC-USDT", "enabled": true, "data_types": ["trade", "orderbook"], "channels": { "trades": "trades", "orderbook": "books5", "ticker": "tickers" } }, { "symbol": "ETH-USDT", "enabled": true, "data_types": ["trade", "orderbook"], "channels": { "trades": "trades", "orderbook": "books5", "ticker": "tickers" } } ], "logging": { "component_name_template": "okx_collector_{symbol}", "log_level": "INFO", "verbose": false }, "database": { "store_processed_data": true, "store_raw_data": true, "batch_size": 100, "flush_interval": 5.0 }, "monitoring": { "enable_health_checks": true, "health_check_interval": 30.0, "alert_on_connection_loss": true, "max_consecutive_errors": 5 } } ``` ### 2. Programmatic Configuration ```python from data.exchanges.okx import OKXCollector from data.base_collector import DataType # Custom configuration collector = OKXCollector( symbol='BTC-USDT', data_types=[DataType.TRADE, DataType.ORDERBOOK], component_name='custom_btc_collector', auto_restart=True, health_check_interval=15.0, # Check every 15 seconds store_raw_data=True # Store raw OKX messages ) ``` ### 3. Factory Configuration ```python from data.exchanges import ExchangeFactory, ExchangeCollectorConfig from data.base_collector import DataType config = ExchangeCollectorConfig( exchange='okx', symbol='ETH-USDT', data_types=[DataType.TRADE, DataType.ORDERBOOK], auto_restart=True, health_check_interval=30.0, store_raw_data=True, custom_params={ 'ping_interval': 20.0, # Custom ping interval 'max_reconnect_attempts': 10, # More reconnection attempts 'pong_timeout': 15.0 # Longer pong timeout } ) collector = ExchangeFactory.create_collector(config) ``` ## Data Processing ### OKX Message Formats #### Trade Data ```python # Raw OKX trade message { "arg": { "channel": "trades", "instId": "BTC-USDT" }, "data": [ { "instId": "BTC-USDT", "tradeId": "12345678", "px": "50000.5", # Price "sz": "0.001", # Size "side": "buy", # Side (buy/sell) "ts": "1697123456789" # Timestamp (ms) } ] } # Processed MarketDataPoint MarketDataPoint( exchange="okx", symbol="BTC-USDT", timestamp=datetime(2023, 10, 12, 15, 30, 56, tzinfo=timezone.utc), data_type=DataType.TRADE, data={ "instId": "BTC-USDT", "tradeId": "12345678", "px": "50000.5", "sz": "0.001", "side": "buy", "ts": "1697123456789" } ) ``` #### Orderbook Data ```python # Raw OKX orderbook message (books5) { "arg": { "channel": "books5", "instId": "BTC-USDT" }, "data": [ { "asks": [ ["50001.0", "0.5", "0", "3"], # [price, size, liquidated, orders] ["50002.0", "1.0", "0", "5"] ], "bids": [ ["50000.0", "0.8", "0", "2"], ["49999.0", "1.2", "0", "4"] ], "ts": "1697123456789", "checksum": "123456789" } ] } # Usage in callback def on_orderbook(data_point): book = data_point.data if book.get('bids') and book.get('asks'): best_bid = book['bids'][0] best_ask = book['asks'][0] spread = float(best_ask[0]) - float(best_bid[0]) print(f"Spread: ${spread:.2f}") ``` #### Ticker Data ```python # Raw OKX ticker message { "arg": { "channel": "tickers", "instId": "BTC-USDT" }, "data": [ { "instType": "SPOT", "instId": "BTC-USDT", "last": "50000.5", # Last price "lastSz": "0.001", # Last size "askPx": "50001.0", # Best ask price "askSz": "0.5", # Best ask size "bidPx": "50000.0", # Best bid price "bidSz": "0.8", # Best bid size "open24h": "49500.0", # 24h open "high24h": "50500.0", # 24h high "low24h": "49000.0", # 24h low "vol24h": "1234.567", # 24h volume "ts": "1697123456789" } ] } ``` ### Data Validation The OKX collector includes comprehensive data validation: ```python # Automatic validation in collector class OKXCollector(BaseDataCollector): async def _process_data_item(self, channel: str, data_item: Dict[str, Any]): # Validate message structure if not isinstance(data_item, dict): self.logger.warning("Invalid data item type") return None # Validate required fields based on channel if channel == "trades": required_fields = ['tradeId', 'px', 'sz', 'side', 'ts'] elif channel == "books5": required_fields = ['bids', 'asks', 'ts'] elif channel == "tickers": required_fields = ['last', 'ts'] else: self.logger.warning(f"Unknown channel: {channel}") return None # Check required fields for field in required_fields: if field not in data_item: self.logger.warning(f"Missing required field '{field}' in {channel} data") return None # Process and return validated data return await self._create_market_data_point(channel, data_item) ``` ## Monitoring and Status ### Status Information ```python # Get comprehensive status status = collector.get_status() print(f"Exchange: {status['exchange']}") # 'okx' print(f"Symbol: {status['symbol']}") # 'BTC-USDT' print(f"Status: {status['status']}") # 'running' print(f"WebSocket Connected: {status['websocket_connected']}") # True/False print(f"WebSocket State: {status['websocket_state']}") # 'connected' print(f"Messages Processed: {status['messages_processed']}") # Integer print(f"Errors: {status['errors']}") # Integer print(f"Last Trade ID: {status['last_trade_id']}") # String or None # WebSocket statistics if 'websocket_stats' in status: ws_stats = status['websocket_stats'] print(f"Messages Received: {ws_stats['messages_received']}") print(f"Messages Sent: {ws_stats['messages_sent']}") print(f"Pings Sent: {ws_stats['pings_sent']}") print(f"Pongs Received: {ws_stats['pongs_received']}") print(f"Reconnections: {ws_stats['reconnections']}") ``` ### Health Monitoring ```python # Get health status health = collector.get_health_status() print(f"Is Healthy: {health['is_healthy']}") # True/False print(f"Issues: {health['issues']}") # List of issues print(f"Last Heartbeat: {health['last_heartbeat']}") # ISO timestamp print(f"Last Data: {health['last_data_received']}") # ISO timestamp print(f"Should Be Running: {health['should_be_running']}") # True/False print(f"Is Running: {health['is_running']}") # True/False # Auto-restart status if not health['is_healthy']: print("Collector is unhealthy - auto-restart will trigger") for issue in health['issues']: print(f" Issue: {issue}") ``` ### Performance Monitoring ```python import time async def monitor_performance(): collector = create_okx_collector('BTC-USDT', [DataType.TRADE]) await collector.start() start_time = time.time() last_message_count = 0 while True: await asyncio.sleep(10) # Check every 10 seconds status = collector.get_status() current_messages = status.get('messages_processed', 0) # Calculate message rate elapsed = time.time() - start_time messages_per_second = current_messages / elapsed if elapsed > 0 else 0 # Calculate recent rate recent_messages = current_messages - last_message_count recent_rate = recent_messages / 10 # per second over last 10 seconds print(f"=== Performance Stats ===") print(f"Total Messages: {current_messages}") print(f"Average Rate: {messages_per_second:.2f} msg/sec") print(f"Recent Rate: {recent_rate:.2f} msg/sec") print(f"Errors: {status.get('errors', 0)}") print(f"WebSocket State: {status.get('websocket_state', 'unknown')}") last_message_count = current_messages # Run performance monitoring asyncio.run(monitor_performance()) ``` ## WebSocket Connection Details ### OKX WebSocket Client The OKX implementation includes a specialized WebSocket client: ```python from data.exchanges.okx import OKXWebSocketClient, OKXSubscription, OKXChannelType # Create WebSocket client directly (usually handled by collector) ws_client = OKXWebSocketClient( component_name='okx_ws_btc', ping_interval=25.0, # Must be < 30 seconds for OKX pong_timeout=10.0, max_reconnect_attempts=5, reconnect_delay=5.0 ) # Connect to OKX await ws_client.connect(use_public=True) # Create subscriptions subscriptions = [ OKXSubscription( channel=OKXChannelType.TRADES.value, inst_id='BTC-USDT', enabled=True ), OKXSubscription( channel=OKXChannelType.BOOKS5.value, inst_id='BTC-USDT', enabled=True ) ] # Subscribe to channels await ws_client.subscribe(subscriptions) # Add message callback def on_message(message): print(f"Received: {message}") ws_client.add_message_callback(on_message) # WebSocket will handle messages automatically await asyncio.sleep(60) # Disconnect await ws_client.disconnect() ``` ### Connection States The WebSocket client tracks connection states: ```python from data.exchanges.okx.websocket import ConnectionState # Check connection state state = ws_client.connection_state if state == ConnectionState.CONNECTED: print("WebSocket is connected and ready") elif state == ConnectionState.CONNECTING: print("WebSocket is connecting...") elif state == ConnectionState.RECONNECTING: print("WebSocket is reconnecting...") elif state == ConnectionState.DISCONNECTED: print("WebSocket is disconnected") elif state == ConnectionState.ERROR: print("WebSocket has error") ``` ### Ping/Pong Mechanism OKX requires specific ping/pong format: ```python # OKX expects simple "ping" string (not JSON) # The WebSocket client handles this automatically: # Send: "ping" # Receive: "pong" # This is handled automatically by OKXWebSocketClient # Ping interval must be < 30 seconds to avoid disconnection ``` ## Error Handling & Resilience The OKX collector includes comprehensive error handling and automatic recovery mechanisms: ### Connection Management - **Automatic Reconnection**: Handles network disconnections with exponential backoff - **Task Synchronization**: Prevents race conditions during reconnection using asyncio locks - **Graceful Shutdown**: Properly cancels background tasks and closes connections - **Connection State Tracking**: Monitors connection health and validity ### Enhanced WebSocket Handling (v2.1+) - **Race Condition Prevention**: Uses synchronization locks to prevent multiple recv() calls - **Task Lifecycle Management**: Properly manages background task startup and shutdown - **Reconnection Locking**: Prevents concurrent reconnection attempts - **Subscription Persistence**: Automatically re-subscribes to channels after reconnection ```python # The collector handles these scenarios automatically: # - Network interruptions # - WebSocket connection drops # - OKX server maintenance # - Rate limiting responses # - Malformed data packets # Enhanced error logging for diagnostics collector = OKXCollector('BTC-USDT', [DataType.TRADE]) stats = collector.get_status() print(f"Connection state: {stats['connection_state']}") print(f"Reconnection attempts: {stats['reconnect_attempts']}") print(f"Error count: {stats['error_count']}") ``` ### Common Error Patterns #### WebSocket Concurrency Errors (Fixed in v2.1) ``` ERROR: cannot call recv while another coroutine is already running recv or recv_streaming ``` **Solution**: Updated WebSocket client with proper task synchronization and reconnection locking. #### Connection Recovery ```python # Monitor connection health async def monitor_connection(): while True: if collector.is_connected(): print("✅ Connected and receiving data") else: print("❌ Connection issue - auto-recovery in progress") await asyncio.sleep(30) ``` ## Testing ### Unit Tests Run the existing test scripts: ```bash # Test single collector python scripts/test_okx_collector.py single # Test collector manager python scripts/test_okx_collector.py manager # Test factory pattern python scripts/test_exchange_factory.py ``` ### Custom Testing ```python import asyncio from data.exchanges import create_okx_collector from data.base_collector import DataType async def test_okx_collector(): """Test OKX collector functionality.""" # Test data collection message_count = 0 error_count = 0 def on_trade(data_point): nonlocal message_count message_count += 1 print(f"Trade #{message_count}: {data_point.data.get('tradeId')}") def on_error(error): nonlocal error_count error_count += 1 print(f"Error #{error_count}: {error}") # Create and configure collector collector = create_okx_collector( symbol='BTC-USDT', data_types=[DataType.TRADE], auto_restart=True ) collector.add_data_callback(DataType.TRADE, on_trade) # Test lifecycle print("Starting collector...") await collector.start() print("Collecting data for 30 seconds...") await asyncio.sleep(30) print("Stopping collector...") await collector.stop() # Check results status = collector.get_status() print(f"Final status: {status['status']}") print(f"Messages processed: {status.get('messages_processed', 0)}") print(f"Errors: {status.get('errors', 0)}") assert message_count > 0, "No messages received" assert error_count == 0, f"Unexpected errors: {error_count}" print("Test passed!") # Run test asyncio.run(test_okx_collector()) ``` ## Production Deployment ### Recommended Configuration ```python # Production-ready OKX collector setup import asyncio from data.collector_manager import CollectorManager from data.exchanges import create_okx_collector from data.base_collector import DataType async def deploy_okx_production(): """Production deployment configuration.""" # Create manager with appropriate settings manager = CollectorManager( manager_name="okx_production", global_health_check_interval=30.0, # Check every 30 seconds restart_delay=10.0 # Wait 10 seconds between restarts ) # Production trading pairs trading_pairs = [ 'BTC-USDT', 'ETH-USDT', 'SOL-USDT', 'DOGE-USDT', 'TON-USDT', 'UNI-USDT' ] # Create collectors with production settings for symbol in trading_pairs: collector = create_okx_collector( symbol=symbol, data_types=[DataType.TRADE, DataType.ORDERBOOK], auto_restart=True, health_check_interval=15.0, # More frequent health checks store_raw_data=False # Disable raw data storage in production ) manager.add_collector(collector) # Start system await manager.start() # Production monitoring loop try: while True: await asyncio.sleep(60) # Check every minute status = manager.get_status() stats = status.get('statistics', {}) # Log production metrics print(f"=== Production Status ===") print(f"Running: {stats.get('running_collectors', 0)}/{len(trading_pairs)}") print(f"Failed: {stats.get('failed_collectors', 0)}") print(f"Total restarts: {stats.get('restarts_performed', 0)}") # Alert on failures failed_count = stats.get('failed_collectors', 0) if failed_count > 0: print(f"ALERT: {failed_count} collectors failed!") # Implement alerting system here except KeyboardInterrupt: print("Shutting down production system...") await manager.stop() print("Production system stopped") # Deploy to production asyncio.run(deploy_okx_production()) ``` ### Docker Deployment ```dockerfile # Dockerfile for OKX collector FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . # Production command CMD ["python", "-m", "scripts.deploy_okx_production"] ``` ### Environment Variables ```bash # Production environment variables export LOG_LEVEL=INFO export OKX_ENV=production export HEALTH_CHECK_INTERVAL=30 export AUTO_RESTART=true export STORE_RAW_DATA=false export DATABASE_URL=postgresql://user:pass@host:5432/db ``` ## API Reference ### OKXCollector Class ```python class OKXCollector(BaseDataCollector): def __init__(self, symbol: str, data_types: Optional[List[DataType]] = None, component_name: Optional[str] = None, auto_restart: bool = True, health_check_interval: float = 30.0, store_raw_data: bool = True): """ Initialize OKX collector. Args: symbol: Trading symbol (e.g., 'BTC-USDT') data_types: Data types to collect (default: [TRADE, ORDERBOOK]) component_name: Name for logging (default: auto-generated) auto_restart: Enable automatic restart on failures health_check_interval: Seconds between health checks store_raw_data: Whether to store raw OKX data """ ``` ## Key Components The OKX collector consists of three main components working together: ### `OKXCollector` - **Main class**: `OKXCollector(BaseDataCollector)` - **Responsibilities**: - Manages WebSocket connection state - Subscribes to required data channels - Dispatches raw messages to the data processor - Stores standardized data in the database - Provides health and status monitoring ### `OKXWebSocketClient` - **Handles WebSocket communication**: `OKXWebSocketClient` - **Responsibilities**: - Manages connection, reconnection, and ping/pong - Decodes incoming messages - Handles authentication for private channels ### `OKXDataProcessor` - **New in v2.0**: `OKXDataProcessor` - **Responsibilities**: - Validates incoming raw data from WebSocket - Transforms data into standardized `StandardizedTrade` and `OHLCVCandle` formats - Aggregates trades into OHLCV candles - Invokes callbacks for processed trades and completed candles ## Configuration ### `OKXCollector` Configuration Configuration options for the `OKXCollector` class: | Parameter | Type | Default | Description | |-------------------------|---------------------|---------------------------------------|-----------------------------------------------------------------------------| | `symbol` | `str` | - | Trading symbol (e.g., `BTC-USDT`) | | `data_types` | `List[DataType]` | `[TRADE, ORDERBOOK]` | List of data types to collect | | `auto_restart` | `bool` | `True` | Automatically restart on failures | | `health_check_interval` | `float` | `30.0` | Seconds between health checks | | `store_raw_data` | `bool` | `True` | Store raw WebSocket data for debugging | | `force_update_candles` | `bool` | `False` | If `True`, update existing candles; if `False`, keep existing ones unchanged | | `logger` | `Logger` | `None` | Logger instance for conditional logging | | `log_errors_only` | `bool` | `False` | If `True` and logger provided, only log error-level messages | ### Health & Status Monitoring status = collector.get_status() print(json.dumps(status, indent=2)) Example output: ```json { "component_name": "okx_collector_btc_usdt", "status": "running", "uptime": "0:10:15.123456", "symbol": "BTC-USDT", "data_types": ["trade", "orderbook"], "connection_state": "connected", "last_health_check": "2023-11-15T10:30:00Z", "message_count": 1052, "processed_trades": 512, "processed_candles": 10, "error_count": 2 } ``` ## Database Integration