Vasily.onl 4936e5cd73 Implement enhanced data collection system with health monitoring and management

- Introduced `BaseDataCollector` and `CollectorManager` classes for standardized data collection and centralized management.
- Added health monitoring features, including auto-restart capabilities and detailed status reporting for collectors.
- Updated `env.template` to include new logging and health check configurations.
- Enhanced documentation in `docs/data_collectors.md` to provide comprehensive guidance on the new data collection system.
- Added unit tests for `BaseDataCollector` and `CollectorManager` to ensure reliability and functionality.

2025-05-30 20:33:56 +08:00

36 KiB

Raw Blame History

Data Collector System Documentation

Overview

The Data Collector System provides a robust, scalable framework for collecting real-time market data from cryptocurrency exchanges. It features comprehensive health monitoring, automatic recovery, and centralized management capabilities designed for production trading environments.

Key Features

🔄 Auto-Recovery & Health Monitoring

Heartbeat System: Continuous health monitoring with configurable intervals
Auto-Restart: Automatic restart on failures with exponential backoff
Connection Recovery: Robust reconnection logic for network interruptions
Data Freshness Monitoring: Detects stale data and triggers recovery

🎛️ Centralized Management

CollectorManager: Supervises multiple collectors with coordinated lifecycle
Dynamic Control: Enable/disable collectors at runtime without system restart
Global Health Checks: System-wide monitoring and alerting
Graceful Shutdown: Proper cleanup and resource management

📊 Comprehensive Monitoring

Real-time Status: Detailed status reporting for all collectors
Performance Metrics: Message counts, uptime, error rates, restart counts
Health Analytics: Connection state, data freshness, error tracking
Logging Integration: Enhanced logging with configurable verbosity

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   CollectorManager                          │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Global Health Monitor                  │    │
│  │  • System-wide health checks                       │    │
│  │  • Auto-restart coordination                       │    │
│  │  • Performance analytics                           │    │
│  └─────────────────────────────────────────────────────┘    │
│                           │                                 │
│  ┌─────────────────┐  ┌─────────────────┐  ┌──────────────┐ │
│  │  OKX Collector  │  │Binance Collector│  │   Custom     │ │
│  │                 │  │                 │  │  Collector   │ │
│  │ • Health Monitor│  │ • Health Monitor│  │ • Health Mon │ │
│  │ • Auto-restart  │  │ • Auto-restart  │  │ • Auto-resta │ │
│  │ • Data Validate │  │ • Data Validate │  │ • Data Valid │ │
│  └─────────────────┘  └─────────────────┘  └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
                              │
                    ┌─────────────────┐
                    │   Data Output   │
                    │                 │
                    │ • Callbacks     │
                    │ • Redis Pub/Sub │
                    │ • Database      │
                    └─────────────────┘

Quick Start

1. Basic Collector Usage

import asyncio
from data import BaseDataCollector, DataType, MarketDataPoint

class MyExchangeCollector(BaseDataCollector):
    """Custom collector implementation."""
    
    def __init__(self, symbols: list):
        super().__init__("my_exchange", symbols, [DataType.TICKER])
        self.websocket = None
    
    async def connect(self) -> bool:
        """Connect to exchange WebSocket."""
        try:
            # Connect to your exchange WebSocket
            self.websocket = await connect_to_exchange()
            return True
        except Exception:
            return False
    
    async def disconnect(self) -> None:
        """Disconnect from exchange."""
        if self.websocket:
            await self.websocket.close()
    
    async def subscribe_to_data(self, symbols: list, data_types: list) -> bool:
        """Subscribe to data streams."""
        try:
            await self.websocket.subscribe(symbols, data_types)
            return True
        except Exception:
            return False
    
    async def unsubscribe_from_data(self, symbols: list, data_types: list) -> bool:
        """Unsubscribe from data streams."""
        try:
            await self.websocket.unsubscribe(symbols, data_types)
            return True
        except Exception:
            return False
    
    async def _process_message(self, message) -> MarketDataPoint:
        """Process incoming message."""
        return MarketDataPoint(
            exchange=self.exchange_name,
            symbol=message['symbol'],
            timestamp=message['timestamp'],
            data_type=DataType.TICKER,
            data=message['data']
        )
    
    async def _handle_messages(self) -> None:
        """Handle incoming messages."""
        try:
            message = await self.websocket.receive()
            data_point = await self._process_message(message)
            await self._notify_callbacks(data_point)
        except Exception as e:
            # This will trigger reconnection logic
            raise e

# Usage
async def main():
    # Create collector
    collector = MyExchangeCollector(["BTC-USDT", "ETH-USDT"])
    
    # Add data callback
    def on_data(data_point: MarketDataPoint):
        print(f"Received: {data_point.symbol} - {data_point.data}")
    
    collector.add_data_callback(DataType.TICKER, on_data)
    
    # Start collector (with auto-restart enabled by default)
    await collector.start()
    
    # Let it run
    await asyncio.sleep(60)
    
    # Stop collector
    await collector.stop()

asyncio.run(main())

2. Using CollectorManager

import asyncio
from data import CollectorManager, CollectorConfig

async def main():
    # Create manager
    manager = CollectorManager(
        "trading_system_manager",
        global_health_check_interval=30.0  # Check every 30 seconds
    )
    
    # Create collectors
    okx_collector = OKXCollector(["BTC-USDT", "ETH-USDT"])
    binance_collector = BinanceCollector(["BTC-USDT", "ETH-USDT"])
    
    # Add collectors with custom configs
    manager.add_collector(okx_collector, CollectorConfig(
        name="okx_main",
        exchange="okx",
        symbols=["BTC-USDT", "ETH-USDT"],
        data_types=["ticker", "trade"],
        auto_restart=True,
        health_check_interval=15.0,
        enabled=True
    ))
    
    manager.add_collector(binance_collector, CollectorConfig(
        name="binance_backup", 
        exchange="binance",
        symbols=["BTC-USDT", "ETH-USDT"],
        data_types=["ticker"],
        auto_restart=True,
        enabled=False  # Start disabled
    ))
    
    # Start manager
    await manager.start()
    
    # Monitor status
    while True:
        status = manager.get_status()
        print(f"Running: {len(manager.get_running_collectors())}")
        print(f"Failed: {len(manager.get_failed_collectors())}")
        print(f"Restarts: {status['statistics']['restarts_performed']}")
        
        await asyncio.sleep(10)

asyncio.run(main())

API Reference

BaseDataCollector

The abstract base class that all data collectors must inherit from.

Constructor

def __init__(self, 
             exchange_name: str,
             symbols: List[str],
             data_types: Optional[List[DataType]] = None,
             component_name: Optional[str] = None,
             auto_restart: bool = True,
             health_check_interval: float = 30.0)

Parameters:

exchange_name: Name of the exchange (e.g., 'okx', 'binance')
symbols: List of trading symbols to collect data for
data_types: Types of data to collect (default: [DataType.CANDLE])
component_name: Name for logging (default: based on exchange_name)
auto_restart: Enable automatic restart on failures (default: True)
health_check_interval: Seconds between health checks (default: 30.0)

Abstract Methods

Must be implemented by subclasses:

async def connect(self) -> bool
async def disconnect(self) -> None
async def subscribe_to_data(self, symbols: List[str], data_types: List[DataType]) -> bool
async def unsubscribe_from_data(self, symbols: List[str], data_types: List[DataType]) -> bool
async def _process_message(self, message: Any) -> Optional[MarketDataPoint]
async def _handle_messages(self) -> None

Public Methods

async def start() -> bool                    # Start the collector
async def stop(force: bool = False) -> None  # Stop the collector
async def restart() -> bool                  # Restart the collector

# Callback management
def add_data_callback(self, data_type: DataType, callback: Callable) -> None
def remove_data_callback(self, data_type: DataType, callback: Callable) -> None

# Symbol management
def add_symbol(self, symbol: str) -> None
def remove_symbol(self, symbol: str) -> None

# Status and monitoring
def get_status(self) -> Dict[str, Any]
def get_health_status(self) -> Dict[str, Any]

# Data validation
def validate_ohlcv_data(self, data: Dict[str, Any], symbol: str, timeframe: str) -> OHLCVData

Status Information

The get_status() method returns comprehensive status information:

{
    'exchange': 'okx',
    'status': 'running',                    # Current status
    'should_be_running': True,              # Desired state
    'symbols': ['BTC-USDT', 'ETH-USDT'],   # Configured symbols
    'data_types': ['ticker'],               # Data types being collected
    'auto_restart': True,                   # Auto-restart enabled
    'health': {
        'time_since_heartbeat': 5.2,       # Seconds since last heartbeat
        'time_since_data': 2.1,            # Seconds since last data
        'max_silence_duration': 300.0      # Max allowed silence
    },
    'statistics': {
        'messages_received': 1250,          # Total messages received
        'messages_processed': 1248,         # Successfully processed
        'errors': 2,                        # Error count
        'restarts': 1,                      # Restart count
        'uptime_seconds': 3600.5,          # Current uptime
        'reconnect_attempts': 0,            # Current reconnect attempts
        'last_message_time': '2023-...',    # ISO timestamp
        'connection_uptime': '2023-...',    # Connection start time
        'last_error': 'Connection failed',  # Last error message
        'last_restart_time': '2023-...'     # Last restart time
    }
}

Health Status

The get_health_status() method provides detailed health information:

{
    'is_healthy': True,                     # Overall health status
    'issues': [],                          # List of current issues
    'status': 'running',                   # Current collector status
    'last_heartbeat': '2023-...',         # Last heartbeat timestamp
    'last_data_received': '2023-...',     # Last data timestamp
    'should_be_running': True,             # Expected state
    'is_running': True                     # Actual running state
}

CollectorManager

Manages multiple data collectors with coordinated lifecycle and health monitoring.

Constructor

def __init__(self,
             manager_name: str = "collector_manager",
             global_health_check_interval: float = 60.0,
             restart_delay: float = 5.0)

Public Methods

# Collector management
def add_collector(self, collector: BaseDataCollector, config: Optional[CollectorConfig] = None) -> None
def remove_collector(self, collector_name: str) -> bool
def enable_collector(self, collector_name: str) -> bool
def disable_collector(self, collector_name: str) -> bool

# Lifecycle management
async def start() -> bool
async def stop() -> None
async def restart_collector(self, collector_name: str) -> bool
async def restart_all_collectors(self) -> Dict[str, bool]

# Status and monitoring
def get_status(self) -> Dict[str, Any]
def get_collector_status(self, collector_name: str) -> Optional[Dict[str, Any]]
def list_collectors(self) -> List[str]
def get_running_collectors(self) -> List[str]
def get_failed_collectors(self) -> List[str]

CollectorConfig

Configuration dataclass for collectors:

@dataclass
class CollectorConfig:
    name: str                               # Unique collector name
    exchange: str                           # Exchange name
    symbols: List[str]                      # Trading symbols
    data_types: List[str]                   # Data types to collect
    auto_restart: bool = True               # Enable auto-restart
    health_check_interval: float = 30.0    # Health check interval
    enabled: bool = True                    # Initially enabled

Data Types

DataType Enum

class DataType(Enum):
    TICKER = "ticker"        # Price and volume updates
    TRADE = "trade"          # Individual trade executions
    ORDERBOOK = "orderbook"  # Order book snapshots
    CANDLE = "candle"        # OHLCV candle data
    BALANCE = "balance"      # Account balance updates

MarketDataPoint

Standardized market data structure:

@dataclass
class MarketDataPoint:
    exchange: str            # Exchange name
    symbol: str             # Trading symbol
    timestamp: datetime     # Data timestamp (UTC)
    data_type: DataType     # Type of data
    data: Dict[str, Any]    # Raw data payload

OHLCVData

OHLCV (candlestick) data structure with validation:

@dataclass  
class OHLCVData:
    symbol: str                          # Trading symbol
    timeframe: str                       # Timeframe (1m, 5m, 1h, etc.)
    timestamp: datetime                  # Candle timestamp
    open: Decimal                        # Opening price
    high: Decimal                        # Highest price
    low: Decimal                         # Lowest price
    close: Decimal                       # Closing price
    volume: Decimal                      # Trading volume
    trades_count: Optional[int] = None   # Number of trades

Health Monitoring

Monitoring Levels

The system provides multi-level health monitoring:

Individual Collector Health
- Heartbeat monitoring (message loop activity)
- Data freshness (time since last data received)
- Connection state monitoring
- Error rate tracking
Manager-Level Health
- Global health checks across all collectors
- Coordinated restart management
- System-wide performance metrics
- Resource utilization monitoring

Health Check Intervals

Individual Collector: Configurable per collector (default: 30s)
Global Manager: Configurable for manager (default: 60s)
Heartbeat Updates: Updated with each message loop iteration
Data Freshness: Updated when data is received

Auto-Restart Triggers

Collectors are automatically restarted when:

No Heartbeat: Message loop becomes unresponsive
Stale Data: No data received within configured timeout
Connection Failures: WebSocket or API connection lost
Error Status: Collector enters ERROR or UNHEALTHY state
Manual Trigger: Explicit restart request

Failure Handling

# Configure failure handling
collector = MyCollector(
    symbols=["BTC-USDT"],
    auto_restart=True,                    # Enable auto-restart
    health_check_interval=30.0            # Check every 30 seconds
)

# The collector will automatically:
# 1. Detect failures within 30 seconds
# 2. Attempt reconnection with exponential backoff
# 3. Restart up to 5 times (configurable)
# 4. Log all recovery attempts
# 5. Report status to manager

Configuration

Environment Variables

The system respects these environment variables:

# Logging configuration
LOG_LEVEL=INFO                    # Logging level (DEBUG, INFO, WARN, ERROR)
LOG_CLEANUP=true                  # Enable automatic log cleanup
LOG_MAX_FILES=30                  # Maximum log files to retain

# Health monitoring
DEFAULT_HEALTH_CHECK_INTERVAL=30  # Default health check interval (seconds)
MAX_SILENCE_DURATION=300          # Max time without data (seconds)
MAX_RECONNECT_ATTEMPTS=5          # Maximum reconnection attempts
RECONNECT_DELAY=5                 # Delay between reconnect attempts (seconds)

Programmatic Configuration

# Configure individual collector
collector = MyCollector(
    exchange_name="custom_exchange",
    symbols=["BTC-USDT", "ETH-USDT"],
    data_types=[DataType.TICKER, DataType.TRADE],
    auto_restart=True,
    health_check_interval=15.0        # Check every 15 seconds
)

# Configure manager
manager = CollectorManager(
    manager_name="production_manager",
    global_health_check_interval=30.0,   # Global checks every 30s
    restart_delay=10.0                    # 10s delay between restarts
)

# Configure specific collector in manager
config = CollectorConfig(
    name="primary_okx",
    exchange="okx",
    symbols=["BTC-USDT", "ETH-USDT", "SOL-USDT"],
    data_types=["ticker", "trade", "orderbook"],
    auto_restart=True,
    health_check_interval=20.0,
    enabled=True
)

manager.add_collector(collector, config)

Best Practices

1. Collector Implementation

class ProductionCollector(BaseDataCollector):
    def __init__(self, exchange_name: str, symbols: list):
        super().__init__(
            exchange_name=exchange_name,
            symbols=symbols,
            data_types=[DataType.TICKER, DataType.TRADE],
            auto_restart=True,               # Always enable auto-restart
            health_check_interval=30.0       # Reasonable interval
        )
        
        # Connection management
        self.connection_pool = None
        self.rate_limiter = RateLimiter(100, 60)  # 100 requests per minute
        
        # Data validation
        self.data_validator = DataValidator()
        
        # Performance monitoring
        self.metrics = MetricsCollector()
    
    async def connect(self) -> bool:
        """Implement robust connection logic."""
        try:
            # Use connection pooling for reliability
            self.connection_pool = await create_connection_pool(
                self.exchange_name,
                max_connections=5,
                retry_attempts=3
            )
            
            # Test connection
            await self.connection_pool.ping()
            return True
            
        except Exception as e:
            self.logger.error(f"Connection failed: {e}")
            return False
    
    async def _process_message(self, message) -> Optional[MarketDataPoint]:
        """Implement thorough data processing."""
        try:
            # Rate limiting
            await self.rate_limiter.acquire()
            
            # Data validation
            if not self.data_validator.validate(message):
                self.logger.warning(f"Invalid message: {message}")
                return None
            
            # Metrics collection
            self.metrics.increment('messages_processed')
            
            # Create standardized data point
            return MarketDataPoint(
                exchange=self.exchange_name,
                symbol=message['symbol'],
                timestamp=self._parse_timestamp(message['timestamp']),
                data_type=DataType.TICKER,
                data=self._normalize_data(message)
            )
            
        except Exception as e:
            self.metrics.increment('processing_errors')
            self.logger.error(f"Message processing failed: {e}")
            raise  # Let health monitor handle it

2. Error Handling

# Implement proper error handling
class RobustCollector(BaseDataCollector):
    async def _handle_messages(self) -> None:
        """Handle messages with proper error management."""
        try:
            # Check connection health
            if not await self._check_connection_health():
                raise ConnectionError("Connection health check failed")
            
            # Receive message with timeout
            message = await asyncio.wait_for(
                self.websocket.receive(),
                timeout=30.0  # 30 second timeout
            )
            
            # Process message
            data_point = await self._process_message(message)
            if data_point:
                await self._notify_callbacks(data_point)
                
        except asyncio.TimeoutError:
            # No data received - let health monitor handle
            raise ConnectionError("Message receive timeout")
            
        except WebSocketError as e:
            # WebSocket specific errors
            self.logger.error(f"WebSocket error: {e}")
            raise ConnectionError(f"WebSocket failed: {e}")
            
        except ValidationError as e:
            # Data validation errors - don't restart for these
            self.logger.warning(f"Data validation failed: {e}")
            # Continue without raising - these are data issues, not connection issues
            
        except Exception as e:
            # Unexpected errors - trigger restart
            self.logger.error(f"Unexpected error: {e}")
            raise

3. Manager Setup

async def setup_production_system():
    """Setup production collector system."""
    
    # Create manager with appropriate settings
    manager = CollectorManager(
        manager_name="crypto_trading_system",
        global_health_check_interval=60.0,    # Check every minute
        restart_delay=30.0                     # 30s between restarts
    )
    
    # Add primary data sources
    exchanges = ['okx', 'binance', 'coinbase']
    symbols = ['BTC-USDT', 'ETH-USDT', 'SOL-USDT', 'AVAX-USDT']
    
    for exchange in exchanges:
        collector = create_collector(exchange, symbols)
        
        # Configure for production
        config = CollectorConfig(
            name=f"{exchange}_primary",
            exchange=exchange,
            symbols=symbols,
            data_types=["ticker", "trade"],
            auto_restart=True,
            health_check_interval=30.0,
            enabled=True
        )
        
        # Add callbacks for data processing
        collector.add_data_callback(DataType.TICKER, process_ticker_data)
        collector.add_data_callback(DataType.TRADE, process_trade_data)
        
        manager.add_collector(collector, config)
    
    # Start system
    success = await manager.start()
    if not success:
        raise RuntimeError("Failed to start collector system")
    
    return manager

# Usage
async def main():
    manager = await setup_production_system()
    
    # Monitor system health
    while True:
        status = manager.get_status()
        
        if status['statistics']['failed_collectors'] > 0:
            # Alert on failures
            await send_alert(f"Collectors failed: {manager.get_failed_collectors()}")
        
        # Log status every 5 minutes
        await asyncio.sleep(300)

4. Monitoring Integration

# Integrate with monitoring systems
import prometheus_client
from utils.logger import get_logger

class MonitoredCollector(BaseDataCollector):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
        # Prometheus metrics
        self.messages_counter = prometheus_client.Counter(
            'collector_messages_total',
            'Total messages processed',
            ['exchange', 'symbol', 'type']
        )
        
        self.errors_counter = prometheus_client.Counter(
            'collector_errors_total', 
            'Total errors',
            ['exchange', 'error_type']
        )
        
        self.uptime_gauge = prometheus_client.Gauge(
            'collector_uptime_seconds',
            'Collector uptime',
            ['exchange']
        )
    
    async def _notify_callbacks(self, data_point: MarketDataPoint):
        """Override to add metrics."""
        # Update metrics
        self.messages_counter.labels(
            exchange=data_point.exchange,
            symbol=data_point.symbol,
            type=data_point.data_type.value
        ).inc()
        
        # Update uptime
        status = self.get_status()
        if status['statistics']['uptime_seconds']:
            self.uptime_gauge.labels(
                exchange=self.exchange_name
            ).set(status['statistics']['uptime_seconds'])
        
        # Call parent
        await super()._notify_callbacks(data_point)
    
    async def _handle_connection_error(self) -> bool:
        """Override to add error metrics."""
        self.errors_counter.labels(
            exchange=self.exchange_name,
            error_type='connection'
        ).inc()
        
        return await super()._handle_connection_error()

Troubleshooting

Common Issues

1. Collector Won't Start

Symptoms: start() returns False, status shows ERROR

Solutions:

# Check connection details
collector = MyCollector(symbols=["BTC-USDT"])
success = await collector.start()
if not success:
    status = collector.get_status()
    print(f"Error: {status['statistics']['last_error']}")
    
# Common fixes:
# - Verify API credentials
# - Check network connectivity  
# - Validate symbol names
# - Review exchange-specific requirements

2. Frequent Restarts

Symptoms: High restart count, intermittent data

Solutions:

# Adjust health check intervals
collector = MyCollector(
    symbols=["BTC-USDT"],
    health_check_interval=60.0,  # Increase interval
    auto_restart=True
)

# Check for:
# - Network instability
# - Exchange rate limiting
# - Invalid message formats
# - Resource constraints

3. No Data Received

Symptoms: Collector running but no callbacks triggered

Solutions:

# Check data flow
collector = MyCollector(symbols=["BTC-USDT"])

def debug_callback(data_point):
    print(f"Received: {data_point}")

collector.add_data_callback(DataType.TICKER, debug_callback)

# Verify:
# - Callback registration
# - Symbol subscription
# - Message parsing logic
# - Exchange data availability

4. Memory Leaks

Symptoms: Increasing memory usage over time

Solutions:

# Implement proper cleanup
class CleanCollector(BaseDataCollector):
    async def disconnect(self):
        """Ensure proper cleanup."""
        # Clear buffers
        self.message_buffer.clear()
        
        # Close connections
        if self.websocket:
            await self.websocket.close()
            self.websocket = None
        
        # Clear callbacks
        for callback_list in self._data_callbacks.values():
            callback_list.clear()
        
        await super().disconnect()

Performance Optimization

1. Batch Processing

class BatchingCollector(BaseDataCollector):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.message_batch = []
        self.batch_size = 100
        self.batch_timeout = 1.0
    
    async def _handle_messages(self):
        """Batch process messages for efficiency."""
        message = await self.websocket.receive()
        self.message_batch.append(message)
        
        # Process batch when full or timeout
        if (len(self.message_batch) >= self.batch_size or 
            time.time() - self.last_batch_time > self.batch_timeout):
            await self._process_batch()
    
    async def _process_batch(self):
        """Process messages in batch."""
        batch = self.message_batch.copy()
        self.message_batch.clear()
        self.last_batch_time = time.time()
        
        for message in batch:
            data_point = await self._process_message(message)
            if data_point:
                await self._notify_callbacks(data_point)

2. Connection Pooling

class PooledCollector(BaseDataCollector):
    async def connect(self) -> bool:
        """Use connection pooling for better performance."""
        try:
            # Create connection pool
            self.connection_pool = await aiohttp.ClientSession(
                connector=aiohttp.TCPConnector(
                    limit=10,              # Pool size
                    limit_per_host=5,      # Per-host limit
                    keepalive_timeout=300, # Keep connections alive
                    enable_cleanup_closed=True
                )
            )
            return True
        except Exception:
            return False

Logging and Debugging

Enable Debug Logging

import os
os.environ['LOG_LEVEL'] = 'DEBUG'

# Collector will now log detailed information
collector = MyCollector(symbols=["BTC-USDT"])
await collector.start()

# Check logs in ./logs/ directory
# - collector_debug.log: Debug information
# - collector_info.log: General information  
# - collector_error.log: Error messages

Custom Logging

from utils.logger import get_logger

class CustomCollector(BaseDataCollector):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
        # Add custom logger
        self.performance_logger = get_logger(
            f"{self.exchange_name}_performance",
            verbose=False
        )
    
    async def _process_message(self, message):
        start_time = time.time()
        
        try:
            result = await super()._process_message(message)
            
            # Log performance
            processing_time = time.time() - start_time
            self.performance_logger.info(
                f"Message processed in {processing_time:.3f}s"
            )
            
            return result
        except Exception as e:
            self.performance_logger.error(
                f"Processing failed after {time.time() - start_time:.3f}s: {e}"
            )
            raise

Integration Examples

Django Integration

# Django management command
from django.core.management.base import BaseCommand
from data import CollectorManager
import asyncio

class Command(BaseCommand):
    help = 'Start crypto data collectors'
    
    def handle(self, *args, **options):
        async def run_collectors():
            manager = CollectorManager("django_collectors")
            
            # Add collectors
            from myapp.collectors import OKXCollector, BinanceCollector
            manager.add_collector(OKXCollector(['BTC-USDT']))
            manager.add_collector(BinanceCollector(['ETH-USDT']))
            
            # Start system
            await manager.start()
            
            # Keep running
            try:
                while True:
                    await asyncio.sleep(60)
                    status = manager.get_status()
                    self.stdout.write(f"Status: {status['statistics']}")
            except KeyboardInterrupt:
                await manager.stop()
        
        asyncio.run(run_collectors())

FastAPI Integration

# FastAPI application
from fastapi import FastAPI
from data import CollectorManager
import asyncio

app = FastAPI()
manager = None

@app.on_event("startup")
async def startup_event():
    global manager
    manager = CollectorManager("fastapi_collectors")
    
    # Add collectors
    from collectors import OKXCollector
    collector = OKXCollector(['BTC-USDT', 'ETH-USDT'])
    manager.add_collector(collector)
    
    # Start in background
    await manager.start()

@app.on_event("shutdown")
async def shutdown_event():
    global manager
    if manager:
        await manager.stop()

@app.get("/collector/status")
async def get_collector_status():
    return manager.get_status()

@app.post("/collector/{name}/restart")
async def restart_collector(name: str):
    success = await manager.restart_collector(name)
    return {"success": success}

Celery Integration

# Celery task
from celery import Celery
from data import CollectorManager
import asyncio

app = Celery('crypto_collectors')

@app.task
def start_data_collection():
    """Start data collection as Celery task."""
    
    async def run():
        manager = CollectorManager("celery_collectors")
        
        # Setup collectors
        from collectors import OKXCollector, BinanceCollector
        manager.add_collector(OKXCollector(['BTC-USDT']))
        manager.add_collector(BinanceCollector(['ETH-USDT']))
        
        # Start and monitor
        await manager.start()
        
        # Run until stopped
        try:
            while True:
                await asyncio.sleep(300)  # 5 minute intervals
                
                # Check health and restart if needed
                failed = manager.get_failed_collectors()
                if failed:
                    print(f"Restarting failed collectors: {failed}")
                    await manager.restart_all_collectors()
                    
        except Exception as e:
            print(f"Collection error: {e}")
        finally:
            await manager.stop()
    
    # Run async task
    asyncio.run(run())

Migration Guide

From Manual Connection Management

Before (manual management):

class OldCollector:
    def __init__(self):
        self.websocket = None
        self.running = False
    
    async def start(self):
        while self.running:
            try:
                self.websocket = await connect()
                await self.listen()
            except Exception as e:
                print(f"Error: {e}")
                await asyncio.sleep(5)  # Manual retry

After (with BaseDataCollector):

class NewCollector(BaseDataCollector):
    def __init__(self):
        super().__init__("exchange", ["BTC-USDT"])
        # Auto-restart and health monitoring included
    
    async def connect(self) -> bool:
        self.websocket = await connect()
        return True
    
    async def _handle_messages(self):
        message = await self.websocket.receive()
        # Error handling and restart logic automatic

From Basic Monitoring

Before (basic monitoring):

# Manual status tracking
status = {
    'connected': False,
    'last_message': None,
    'error_count': 0
}

# Manual health checks
async def health_check():
    if time.time() - status['last_message'] > 300:
        print("No data for 5 minutes!")

After (comprehensive monitoring):

# Automatic health monitoring
collector = MyCollector(["BTC-USDT"])

# Rich status information
status = collector.get_status()
health = collector.get_health_status()

# Automatic alerts and recovery
if not health['is_healthy']:
    print(f"Issues: {health['issues']}")
    # Auto-restart already triggered

Support and Contributing

Getting Help

Check Logs: Review logs in ./logs/ directory
Status Information: Use get_status() and get_health_status() methods
Debug Mode: Set LOG_LEVEL=DEBUG for detailed logging
Test with Demo: Run examples/collector_demo.py to verify setup

Contributing

The data collector system is designed to be extensible. Contributions are welcome for:

New exchange implementations
Enhanced monitoring features
Performance optimizations
Additional data types
Integration examples

License

This documentation and the associated code are part of the Crypto Trading Bot Platform project.

For more information, see the main project documentation in /docs/.

36 KiB Raw Blame History

Data Collector System Documentation

Overview

Key Features

🔄 Auto-Recovery & Health Monitoring

🎛️ Centralized Management

📊 Comprehensive Monitoring

Architecture

Quick Start

1. Basic Collector Usage

2. Using CollectorManager

API Reference

BaseDataCollector

Constructor

Abstract Methods

Public Methods

Status Information

Health Status

CollectorManager

Constructor

Public Methods

CollectorConfig

Data Types

DataType Enum

MarketDataPoint

OHLCVData

Health Monitoring

Monitoring Levels

Health Check Intervals

Auto-Restart Triggers

Failure Handling

Configuration

Environment Variables

Programmatic Configuration

Best Practices

1. Collector Implementation

2. Error Handling

3. Manager Setup

4. Monitoring Integration

Troubleshooting

Common Issues

1. Collector Won't Start

2. Frequent Restarts

3. No Data Received

4. Memory Leaks

Performance Optimization

1. Batch Processing

2. Connection Pooling

Logging and Debugging

Enable Debug Logging

Custom Logging

Integration Examples

Django Integration

FastAPI Integration

Celery Integration

Migration Guide

From Manual Connection Management

From Basic Monitoring

Support and Contributing

Getting Help

Contributing

License

36 KiB

Raw Blame History