Enhance OKX WebSocket client with improved task management and error handling

- Implemented enhanced task synchronization to prevent race conditions during WebSocket operations.
- Introduced reconnection locking to avoid concurrent reconnection attempts.
- Improved error handling in message processing and reconnection logic, ensuring graceful shutdown and task management.
- Added unit tests to verify the stability and reliability of the WebSocket client under concurrent operations.
This commit is contained in:
Vasily.onl
2025-06-02 23:14:04 +08:00
parent aaebd9a308
commit 01cea1d5e5
4 changed files with 414 additions and 163 deletions

View File

@@ -634,93 +634,56 @@ OKX requires specific ping/pong format:
# Ping interval must be < 30 seconds to avoid disconnection
```
## Error Handling and Troubleshooting
## Error Handling & Resilience
### Common Issues and Solutions
The OKX collector includes comprehensive error handling and automatic recovery mechanisms:
#### 1. Connection Failures
### Connection Management
- **Automatic Reconnection**: Handles network disconnections with exponential backoff
- **Task Synchronization**: Prevents race conditions during reconnection using asyncio locks
- **Graceful Shutdown**: Properly cancels background tasks and closes connections
- **Connection State Tracking**: Monitors connection health and validity
### Enhanced WebSocket Handling (v2.1+)
- **Race Condition Prevention**: Uses synchronization locks to prevent multiple recv() calls
- **Task Lifecycle Management**: Properly manages background task startup and shutdown
- **Reconnection Locking**: Prevents concurrent reconnection attempts
- **Subscription Persistence**: Automatically re-subscribes to channels after reconnection
```python
# Check connection status
status = collector.get_status()
if not status['websocket_connected']:
print("WebSocket not connected")
# Check WebSocket state
ws_state = status.get('websocket_state', 'unknown')
if ws_state == 'error':
print("WebSocket in error state - will auto-restart")
elif ws_state == 'reconnecting':
print("WebSocket is reconnecting...")
# Manual restart if needed
await collector.restart()
# The collector handles these scenarios automatically:
# - Network interruptions
# - WebSocket connection drops
# - OKX server maintenance
# - Rate limiting responses
# - Malformed data packets
# Enhanced error logging for diagnostics
collector = OKXCollector('BTC-USDT', [DataType.TRADE])
stats = collector.get_status()
print(f"Connection state: {stats['connection_state']}")
print(f"Reconnection attempts: {stats['reconnect_attempts']}")
print(f"Error count: {stats['error_count']}")
```
#### 2. Ping/Pong Issues
### Common Error Patterns
```python
# Monitor ping/pong status
if 'websocket_stats' in status:
ws_stats = status['websocket_stats']
pings_sent = ws_stats.get('pings_sent', 0)
pongs_received = ws_stats.get('pongs_received', 0)
if pings_sent > pongs_received + 3: # Allow some tolerance
print("Ping/pong issue detected - connection may be stale")
# Auto-restart will handle this
#### WebSocket Concurrency Errors (Fixed in v2.1)
```
#### 3. Data Validation Errors
```python
# Monitor for validation errors
errors = status.get('errors', 0)
if errors > 0:
print(f"Data validation errors detected: {errors}")
# Check logs for details:
# - Malformed messages
# - Missing required fields
# - Invalid data types
ERROR: cannot call recv while another coroutine is already running recv or recv_streaming
```
**Solution**: Updated WebSocket client with proper task synchronization and reconnection locking.
#### 4. Performance Issues
#### Connection Recovery
```python
# Monitor message processing rate
messages = status.get('messages_processed', 0)
uptime = status.get('uptime_seconds', 1)
rate = messages / uptime
if rate < 1.0: # Less than 1 message per second
print("Low message rate - check:")
print("- Network connectivity")
print("- OKX API status")
print("- Symbol activity")
```
### Debug Mode
Enable debug logging for detailed information:
```python
import os
os.environ['LOG_LEVEL'] = 'DEBUG'
# Create collector with verbose logging
collector = create_okx_collector(
symbol='BTC-USDT',
data_types=[DataType.TRADE, DataType.ORDERBOOK]
)
await collector.start()
# Check logs in ./logs/ directory:
# - okx_collector_btc_usdt_debug.log
# - okx_collector_btc_usdt_info.log
# - okx_collector_btc_usdt_error.log
# Monitor connection health
async def monitor_connection():
while True:
if collector.is_connected():
print("✅ Connected and receiving data")
else:
print("❌ Connection issue - auto-recovery in progress")
await asyncio.sleep(30)
```
## Testing