Enhance OKX WebSocket client with improved task management and error handling
- Implemented enhanced task synchronization to prevent race conditions during WebSocket operations. - Introduced reconnection locking to avoid concurrent reconnection attempts. - Improved error handling in message processing and reconnection logic, ensuring graceful shutdown and task management. - Added unit tests to verify the stability and reliability of the WebSocket client under concurrent operations.
This commit is contained in:
@@ -634,93 +634,56 @@ OKX requires specific ping/pong format:
|
||||
# Ping interval must be < 30 seconds to avoid disconnection
|
||||
```
|
||||
|
||||
## Error Handling and Troubleshooting
|
||||
## Error Handling & Resilience
|
||||
|
||||
### Common Issues and Solutions
|
||||
The OKX collector includes comprehensive error handling and automatic recovery mechanisms:
|
||||
|
||||
#### 1. Connection Failures
|
||||
### Connection Management
|
||||
- **Automatic Reconnection**: Handles network disconnections with exponential backoff
|
||||
- **Task Synchronization**: Prevents race conditions during reconnection using asyncio locks
|
||||
- **Graceful Shutdown**: Properly cancels background tasks and closes connections
|
||||
- **Connection State Tracking**: Monitors connection health and validity
|
||||
|
||||
### Enhanced WebSocket Handling (v2.1+)
|
||||
- **Race Condition Prevention**: Uses synchronization locks to prevent multiple recv() calls
|
||||
- **Task Lifecycle Management**: Properly manages background task startup and shutdown
|
||||
- **Reconnection Locking**: Prevents concurrent reconnection attempts
|
||||
- **Subscription Persistence**: Automatically re-subscribes to channels after reconnection
|
||||
|
||||
```python
|
||||
# Check connection status
|
||||
status = collector.get_status()
|
||||
if not status['websocket_connected']:
|
||||
print("WebSocket not connected")
|
||||
|
||||
# Check WebSocket state
|
||||
ws_state = status.get('websocket_state', 'unknown')
|
||||
|
||||
if ws_state == 'error':
|
||||
print("WebSocket in error state - will auto-restart")
|
||||
elif ws_state == 'reconnecting':
|
||||
print("WebSocket is reconnecting...")
|
||||
|
||||
# Manual restart if needed
|
||||
await collector.restart()
|
||||
# The collector handles these scenarios automatically:
|
||||
# - Network interruptions
|
||||
# - WebSocket connection drops
|
||||
# - OKX server maintenance
|
||||
# - Rate limiting responses
|
||||
# - Malformed data packets
|
||||
|
||||
# Enhanced error logging for diagnostics
|
||||
collector = OKXCollector('BTC-USDT', [DataType.TRADE])
|
||||
stats = collector.get_status()
|
||||
print(f"Connection state: {stats['connection_state']}")
|
||||
print(f"Reconnection attempts: {stats['reconnect_attempts']}")
|
||||
print(f"Error count: {stats['error_count']}")
|
||||
```
|
||||
|
||||
#### 2. Ping/Pong Issues
|
||||
### Common Error Patterns
|
||||
|
||||
```python
|
||||
# Monitor ping/pong status
|
||||
if 'websocket_stats' in status:
|
||||
ws_stats = status['websocket_stats']
|
||||
pings_sent = ws_stats.get('pings_sent', 0)
|
||||
pongs_received = ws_stats.get('pongs_received', 0)
|
||||
|
||||
if pings_sent > pongs_received + 3: # Allow some tolerance
|
||||
print("Ping/pong issue detected - connection may be stale")
|
||||
# Auto-restart will handle this
|
||||
#### WebSocket Concurrency Errors (Fixed in v2.1)
|
||||
```
|
||||
|
||||
#### 3. Data Validation Errors
|
||||
|
||||
```python
|
||||
# Monitor for validation errors
|
||||
errors = status.get('errors', 0)
|
||||
if errors > 0:
|
||||
print(f"Data validation errors detected: {errors}")
|
||||
|
||||
# Check logs for details:
|
||||
# - Malformed messages
|
||||
# - Missing required fields
|
||||
# - Invalid data types
|
||||
ERROR: cannot call recv while another coroutine is already running recv or recv_streaming
|
||||
```
|
||||
**Solution**: Updated WebSocket client with proper task synchronization and reconnection locking.
|
||||
|
||||
#### 4. Performance Issues
|
||||
|
||||
#### Connection Recovery
|
||||
```python
|
||||
# Monitor message processing rate
|
||||
messages = status.get('messages_processed', 0)
|
||||
uptime = status.get('uptime_seconds', 1)
|
||||
rate = messages / uptime
|
||||
|
||||
if rate < 1.0: # Less than 1 message per second
|
||||
print("Low message rate - check:")
|
||||
print("- Network connectivity")
|
||||
print("- OKX API status")
|
||||
print("- Symbol activity")
|
||||
```
|
||||
|
||||
### Debug Mode
|
||||
|
||||
Enable debug logging for detailed information:
|
||||
|
||||
```python
|
||||
import os
|
||||
os.environ['LOG_LEVEL'] = 'DEBUG'
|
||||
|
||||
# Create collector with verbose logging
|
||||
collector = create_okx_collector(
|
||||
symbol='BTC-USDT',
|
||||
data_types=[DataType.TRADE, DataType.ORDERBOOK]
|
||||
)
|
||||
|
||||
await collector.start()
|
||||
|
||||
# Check logs in ./logs/ directory:
|
||||
# - okx_collector_btc_usdt_debug.log
|
||||
# - okx_collector_btc_usdt_info.log
|
||||
# - okx_collector_btc_usdt_error.log
|
||||
# Monitor connection health
|
||||
async def monitor_connection():
|
||||
while True:
|
||||
if collector.is_connected():
|
||||
print("✅ Connected and receiving data")
|
||||
else:
|
||||
print("❌ Connection issue - auto-recovery in progress")
|
||||
await asyncio.sleep(30)
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
Reference in New Issue
Block a user