# Data Collection Service

**Service for collecting and storing real-time market data from multiple exchanges.**

## Architecture Overview

The data collection service has been refactored into a **modular, component-based architecture** to collect data for multiple trading pairs concurrently with improved maintainability, scalability, and testability.

- **`DataCollectionService`**: The primary orchestration layer, responsible for initializing and coordinating core service components. It delegates specific functionalities to dedicated managers and factories.
- **`CollectorManager`**: Now acts as an orchestrator for individual data collectors, utilizing its own set of internal components (e.g., `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, `ManagerLogger`).
- **Dedicated Components**: Specific concerns like configuration, collector creation, and asynchronous task management are handled by new, specialized classes (`ServiceConfig`, `CollectorFactory`, `AsyncTaskManager`).
- **`OKXCollector`**: A dedicated worker responsible for collecting data for a single trading pair from the OKX exchange, now built upon a more robust `BaseDataCollector` and its internal components (`ConnectionManager`, `CollectorStateAndTelemetry`, `CallbackDispatcher`).

This modular architecture allows for high scalability, fault tolerance, and clear separation of concerns.

## Key Components

### `DataCollectionService`

- **Location**: `data/collection_service.py`
- **Responsibilities**:
    - Orchestrates the overall data collection process.
    - Initializes and coordinates `ServiceConfig`, `CollectorFactory`, `CollectorManager`, and `AsyncTaskManager`.
    - Manages the main service loop and graceful shutdown.
    - Provides a high-level API for running and monitoring the service.

### `ServiceConfig`

- **Location**: `config/service_config.py`
- **Responsibilities**:
    - Handles loading, creating, and validating service configurations.
    - Ensures configuration file integrity, including file permission validation.
    - Manages default configuration generation and runtime updates.

### `CollectorFactory`

- **Location**: `data/collector_factory.py`
- **Responsibilities**:
    - Encapsulates the logic for creating individual data collector instances (e.g., `OKXCollector`).
    - Decouples collector instantiation from the `DataCollectionService`.
    - Ensures collectors are created with correct configurations and dependencies.

### `AsyncTaskManager`

- **Location**: `utils/async_task_manager.py`
- **Responsibilities**:
    - Manages and tracks `asyncio.Task` instances throughout the application.
    - Prevents potential memory leaks by ensuring proper task lifecycle management.
    - Facilitates robust asynchronous operations for both `DataCollectionService` and `CollectorManager`.

### `CollectorManager`

- **Location**: `data/collector_manager.py`
- **Responsibilities**:
    - Acts as an orchestrator for all active data collectors.
    - Delegates specific responsibilities to its new internal components:
        - `CollectorLifecycleManager`: Manages adding, removing, starting, and stopping collectors.
        - `ManagerHealthMonitor`: Encapsulates global health monitoring and auto-restart logic.
        - `ManagerStatsTracker`: Handles performance statistics collection and caching.
        - `ManagerLogger`: Centralizes logging operations for the manager and its collectors.
    - Provides a unified interface for controlling and monitoring managed collectors.

### `OKXCollector`

- **Location**: `data/exchanges/okx/collector.py`
- **Responsibilities**:
    - Inherits from `BaseDataCollector` and implements exchange-specific data collection logic.
    - Utilizes `ConnectionManager` for robust WebSocket connection management.
    - Leverages `CollectorStateAndTelemetry` for internal status, health, and logging.
    - Uses `CallbackDispatcher` to notify registered consumers of processed data.
    - Subscribes to real-time data channels specific to OKX.
    - Processes and standardizes incoming OKX data before dispatching.
    - Stores processed data in the database.

## Configuration

The service is configured through `config/bot_configs/data_collector_config.json`:

```json
{
  "service_name": "data_collection_service",
  "enabled": true,
  "manager_config": {
    "component_name": "collector_manager",
    "health_check_interval": 60,
    "log_level": "INFO",
    "verbose": true
  },
  "collectors": [
    {
      "exchange": "okx",
      "symbol": "BTC-USDT",
      "data_types": ["trade", "orderbook"],
      "enabled": true
    },
    {
      "exchange": "okx",
      "symbol": "ETH-USDT",
      "data_types": ["trade"],
      "enabled": true
    }
  ]
}
```

## Usage

The `DataCollectionService` is the main entry point for running the data collection system.

Start the service from a script (e.g., `scripts/start_data_collection.py`):

```python
# scripts/start_data_collection.py
import asyncio
from data.collection_service import DataCollectionService
from utils.logger import setup_logging # Assuming this exists or is created

async def main():
    setup_logging() # Initialize logging
    service = DataCollectionService(config_path="config/data_collection.json")
    await service.run() # Or run with a duration: await service.run(duration_hours=24)

if __name__ == "__main__":
    asyncio.run(main())
```

## Health & Monitoring

The `DataCollectionService` and `CollectorManager` provide comprehensive health and monitoring capabilities through their dedicated components.

## Features

- **Service Lifecycle Management**: Start, stop, and monitor data collection operations
- **JSON Configuration**: File-based configuration with automatic defaults
- **Clean Production Logging**: Only essential operational information
- **Health Monitoring**: Service-level health checks and auto-recovery
- **Graceful Shutdown**: Proper signal handling and cleanup
- **Multi-Exchange Orchestration**: Coordinate collectors across multiple exchanges
- **Production Ready**: Designed for 24/7 operation with monitoring

## Quick Start

### Basic Usage

```bash
# Start with default configuration (indefinite run)
python scripts/start_data_collection.py

# Run for 8 hours
python scripts/start_data_collection.py --hours 8

# Use custom configuration
python scripts/start_data_collection.py --config config/my_config.json
```

### Monitoring

```bash
# Check status once
python scripts/monitor_clean.py

# Monitor continuously every 60 seconds
python scripts/monitor_clean.py --interval 60
```

## Configuration

The service uses JSON configuration files with automatic default creation if none exists.

### Default Configuration Location

`config/data_collection.json`

### Configuration Structure

```json
{
  "exchanges": {
    "okx": {
      "enabled": true,
      "trading_pairs": [
        {
          "symbol": "BTC-USDT",
          "enabled": true,
          "data_types": ["trade"],
          "timeframes": ["1m", "5m", "15m", "1h"]
        },
        {
          "symbol": "ETH-USDT",
          "enabled": true,
          "data_types": ["trade"],
          "timeframes": ["1m", "5m", "15m", "1h"]
        }
      ]
    }
  },
  "collection_settings": {
    "health_check_interval": 120,
    "store_raw_data": true,
    "auto_restart": true,
    "max_restart_attempts": 3
  },
  "logging": {
    "level": "INFO",
    "log_errors_only": true,
    "verbose_data_logging": false
  }
}
```

### Configuration Options

#### Exchange Settings

- **enabled**: Whether to enable this exchange
- **trading_pairs**: Array of trading pair configurations

#### Trading Pair Settings

- **symbol**: Trading pair symbol (e.g., "BTC-USDT")
- **enabled**: Whether to collect data for this pair
- **data_types**: Types of data to collect (["trade"], ["ticker"], etc.)
- **timeframes**: Candle timeframes to generate (["1m", "5m", "15m", "1h", "4h", "1d"])

#### Collection Settings

- **health_check_interval**: Health check frequency in seconds
- **store_raw_data**: Whether to store raw trade data
- **auto_restart**: Enable automatic restart on failures
- **max_restart_attempts**: Maximum restart attempts before giving up

#### Logging Settings

- **level**: Log level ("DEBUG", "INFO", "WARNING", "ERROR")
- **log_errors_only**: Only log errors and essential events
- **verbose_data_logging**: Enable verbose logging of individual trades/candles

## Service Architecture

### Service Layer Components

```mermaid
graph TD
    subgraph DataCollectionService
        SC[ServiceConfig] -- Manages --> Conf(Configuration)
        SCF[CollectorFactory] -- Creates --> Collectors(Data Collectors)
        ATM[AsyncTaskManager] -- Manages --> Tasks(Async Tasks)
        DCS[DataCollectionService] -- Uses --> SC
        DCS -- Uses --> SCF
        DCS -- Uses --> ATM
        DCS -- Orchestrates --> CM(CollectorManager)
    end

    subgraph CollectorManager
        CM --> CLM(CollectorLifecycleManager)
        CM --> MHM(ManagerHealthMonitor)
        CM --> MST(ManagerStatsTracker)
        CM --> ML(ManagerLogger)
        CLM -- Manages --> BC[BaseDataCollector]
        MHM -- Monitors --> BC
        MST -- Tracks --> BC
        ML -- Logs For --> BC
    end

    subgraph BaseDataCollector (Core Data Collector)
        BC --> ConM(ConnectionManager)
        BC --> CST(CollectorStateAndTelemetry)
        BC --> CD(CallbackDispatcher)
    end

    Conf -- Provides --> DCS
    Collectors -- Created By --> SCF
    Tasks -- Managed By --> ATM
    CM -- Manages --> BaseDataCollector
    BaseDataCollector -- Collects Data --> Database
    BaseDataCollector -- Publishes Data --> Redis(Redis Pub/Sub)

    style DCS fill:#f9f,stroke:#333,stroke-width:2px
    style CM fill:#bbf,stroke:#333,stroke-width:2px
    style BC fill:#cfc,stroke:#333,stroke-width:2px
    style SC fill:#FFD700,stroke:#333,stroke-width:1px
    style SCF fill:#90EE90,stroke:#333,stroke-width:1px
    style ATM fill:#ADD8E6,stroke:#333,stroke-width:1px
    style CLM fill:#FFC0CB,stroke:#333,stroke-width:1px
    style MHM fill:#C0C0C0,stroke:#333,stroke-width:1px
    style MST fill:#DA70D6,stroke:#333,stroke-width:1px
    style ML fill:#DDA0DD,stroke:#333,stroke-width:1px
    style ConM fill:#F0F8FF,stroke:#333,stroke-width:1px
    style CST fill:#FFE4E1,stroke:#333,stroke-width:1px
    style CD fill:#FAFAD2,stroke:#333,stroke-width:1px
    style DB fill:#A9A9A9,stroke:#333,stroke-width:1px
    style Redis fill:#FF6347,stroke:#333,stroke-width:1px
```

### Data Flow

```mermaid
graph LR
    Config(Configuration) --> ServiceConfig
    ServiceConfig --> DataCollectionService
    DataCollectionService -- Initializes --> CollectorManager
    DataCollectionService -- Initializes --> CollectorFactory
    DataCollectionService -- Initializes --> AsyncTaskManager
    CollectorFactory -- Creates --> BaseDataCollector
    CollectorManager -- Manages --> BaseDataCollector
    BaseDataCollector -- Collects Data --> Database
    BaseDataCollector -- Publishes Data --> RedisPubSub(Redis Pub/Sub)
    HealthMonitor(Health Monitoring) --> DataCollectionService
    HealthMonitor --> CollectorManager
    HealthMonitor --> BaseDataCollector
    ErrorHandling(Error Handling) --> DataCollectionService
    ErrorHandling --> CollectorManager
    ErrorHandling --> BaseDataCollector
```

### Storage Integration

- **Raw Data**: PostgreSQL `raw_trades` table via repository pattern
- **Candles**: PostgreSQL `market_data` table with multiple timeframes
- **Real-time**: Redis pub/sub for live data distribution
- **Service Metrics**: Service uptime, error counts, collector statistics

## Logging Philosophy

The service implements **clean production logging** focused on operational needs:

### What Gets Logged

✅ **Service Lifecycle**
- Service start/stop events
- Configuration loading
- Service initialization

✅ **Collector Orchestration**
- Collector creation and destruction
- Service-level health summaries
- Recovery operations

✅ **Configuration Events**
- Config file changes
- Runtime configuration updates
- Validation errors

✅ **Service Statistics**
- Periodic uptime reports
- Collection summary statistics
- Performance metrics

### What Doesn't Get Logged

❌ **Individual Data Points**
- Every trade received
- Every candle generated
- Raw market data

❌ **Internal Operations**
- Individual collector heartbeats
- Routine database operations
- Internal processing steps

## API Reference

### DataCollectionService

The main service class for managing data collection operations, now orchestrating through specialized components.

#### Constructor

```python
DataCollectionService(
    config_path: str = "config/data_collection.json",
    service_config: Optional[ServiceConfig] = None,
    collector_factory: Optional[CollectorFactory] = None,
    collector_manager: Optional[CollectorManager] = None,
    async_task_manager: Optional[AsyncTaskManager] = None
)
```

**Parameters:**
- `config_path`: Path to JSON configuration file. Used if `service_config` is not provided.
- `service_config`: An instance of `ServiceConfig`. If None, one will be created.
- `collector_factory`: An instance of `CollectorFactory`. If None, one will be created.
- `collector_manager`: An instance of `CollectorManager`. If None, one will be created.
- `async_task_manager`: An instance of `AsyncTaskManager`. If None, one will be created.

#### Methods

##### `async run(duration_hours: Optional[float] = None) -> None`

Runs the service for a specified duration or indefinitely. This method now coordinates the main event loop and lifecycle of all internal components.

**Parameters:**
- `duration_hours`: Optional duration in hours (None = indefinite).

**Returns:**
- `None`

**Example:**
```python
from data.collection_service import DataCollectionService
import asyncio

async def run_service():
    service = DataCollectionService()
    await service.run(duration_hours=24) # Run for 24 hours

if __name__ == "__main__":
    asyncio.run(run_service())
```

##### `async start() -> None`

Initializes and starts the data collection service and all configured collectors. This method delegates to internal components for their respective startup procedures.

**Returns:**
- `None`

##### `async stop() -> None`

Stops the service gracefully, including all collectors and internal cleanup. Ensures all asynchronous tasks are properly cancelled and resources released.

**Returns:**
- `None`

##### `get_status() -> Dict[str, Any]`

Gets current service status, including uptime, collector counts, and errors, aggregated from underlying components.

**Returns:**
```python
{
    'service_running': True,
    'uptime_hours': 12.5,
    'collectors_total': 6,
    'collectors_running': 5,
    'collectors_failed': 1,
    'errors_count': 2,
    'last_error': 'Connection timeout for ETH-USDT',
    'configuration': {
        'config_file': 'config/data_collection.json',
        'exchanges_enabled': ['okx'],
        'total_trading_pairs': 6
    },
    'detailed_collector_statuses': { # New field for detailed statuses
        'okx_BTC-USDT': {'status': 'RUNNING', 'health_score': 95},
        'okx_ETH-USDT': {'status': 'ERROR', 'last_error': 'Connection refused'}
    }
}
```

##### `_run_main_loop(duration_hours: Optional[float])`

Internal method extracted from `run()` to manage the core asynchronous loop.

**Parameters:**
- `duration_hours`: Optional duration in hours for the loop.

**Returns:**
- `None`

### Standalone Function

#### `run_data_collection_service(config_path, duration_hours)`

```python
async def run_data_collection_service(
    config_path: str = "config/data_collection.json",
    duration_hours: Optional[float] = None
) -> None
```

Convenience function to run the service with minimal setup, internally creating a `DataCollectionService` instance.

**Parameters:**
- `config_path`: Path to configuration file.
- `duration_hours`: Optional duration in hours.

**Returns:**
- `None`

## Integration Examples

### Basic Service Integration

```python
import asyncio
from data.collection_service import DataCollectionService
from utils.logger import setup_logging # Assuming this exists or is created

async def main():
    setup_logging()
    service = DataCollectionService("config/my_config.json")

    # Run for 24 hours
    await service.run(duration_hours=24)

    print("Service run finished.")

if __name__ == "__main__":
    asyncio.run(main())
```

### Custom Status Monitoring

```python
import asyncio
from data.collection_service import DataCollectionService
from utils.logger import setup_logging

async def monitor_service():
    setup_logging()
    service = DataCollectionService()

    # Start service in background
    start_task = asyncio.create_task(service.run())

    # Monitor status every 60 seconds
    try:
        while True:
            status = service.get_status()
            print(f"Service Uptime: {status['uptime_hours']:.1f}h")
            print(f"Collectors: {status['collectors_running']}/{status['collectors_total']}")
            print(f"Errors: {status['errors_count']}")
            if status['errors_count'] > 0:
                print(f"Last error: {status['last_error']}")
            print("Detailed Collector Statuses:")
            for name, details in status.get('detailed_collector_statuses', {}).items():
                print(f"  - {name}: Status={details.get('status')}, Health Score={details.get('health_score')}")

            await asyncio.sleep(60)
    except asyncio.CancelledError:
        print("Monitoring cancelled.")
    finally:
        await service.stop()
        await start_task # Ensure the main service task is awaited

asyncio.run(monitor_service())
```

### Programmatic Control

```python
import asyncio
from data.collection_service import DataCollectionService
from utils.logger import setup_logging

async def controlled_collection():
    setup_logging()
    service = DataCollectionService()

    try:
        # Start the service
        await service.start()
        print("Data collection service started.")

        # Monitor and control
        while True:
            status = service.get_status()
            print(f"Current Service Status: {status['service_running']}, Collectors Running: {status['collectors_running']}")

            # Example: Stop if certain condition met (e.g., specific error, or after a duration)
            if status['collectors_failed'] > 0:
                print("Some collectors failed, service is recovering...")
                # The service's internal health monitor and task manager will handle restarts
            # For demonstration, stop after 5 minutes
            await asyncio.sleep(300)
            print("Stopping service after 5 minutes of operation.")
            break

    except KeyboardInterrupt:
        print("Manual shutdown requested.")
    finally:
        print("Shutting down service gracefully...")
        await service.stop()
        print("Service stopped.")

if __name__ == "__main__":
    asyncio.run(controlled_collection())
```

### Configuration Management

```python
import asyncio
import json
from data.collection_service import DataCollectionService
from utils.logger import setup_logging
from config.service_config import ServiceConfig # Import the new ServiceConfig

async def dynamic_configuration():
    setup_logging()
    # Instantiate ServiceConfig directly or let DataCollectionService create it
    service_config_instance = ServiceConfig(config_path="config/data_collection.json")
    service = DataCollectionService(service_config=service_config_instance)

    print("Initial configuration loaded:")
    print(json.dumps(service_config_instance.get_config(), indent=2))

    # Load and modify configuration
    config = service_config_instance.get_config()

    # Add new trading pair if not already present
    new_pair = {
        'symbol': 'SOL-USDT',
        'enabled': True,
        'data_types': ['trade'],
        'timeframes': ['1m', '5m']
    }
    if new_pair not in config['exchanges']['okx']['trading_pairs']:
        config['exchanges']['okx']['trading_pairs'].append(new_pair)
        print("Added SOL-USDT to configuration.")
    else:
        print("SOL-USDT already in configuration.")

    # Save updated configuration
    service_config_instance.save_config(config) # Use ServiceConfig to save

    print("Updated configuration saved. Restarting service with new config...")
    await service.stop()
    await service.start()
    print("Service restarted with updated configuration.")

    # Verify new pair is active (logic would be in get_status or similar)
    status = service.get_status()
    print(f"Current active collectors count: {status['collectors_total']}")

if __name__ == "__main__":
    asyncio.run(dynamic_configuration())
```

## Error Handling

The service implements robust error handling at multiple layers, leveraging the new component structure for more precise error management and recovery.

### Service Level Errors

- **Configuration Errors**: Invalid JSON, missing required fields, file permission issues (handled by `ServiceConfig`).
- **Initialization Errors**: Failed collector creation (handled by `CollectorFactory`), database connectivity.
- **Runtime Errors**: Service-level exceptions, resource exhaustion, unhandled exceptions in asynchronous tasks (managed by `AsyncTaskManager`).

### Error Recovery Strategies

1. **Graceful Degradation**: Continue with healthy collectors while attempting to recover failed ones.
2. **Configuration Validation**: `ServiceConfig` validates configurations before application, preventing common startup issues.
3. **Automated Restarts**: `ManagerHealthMonitor` and `AsyncTaskManager` coordinate automatic restarts for failed collectors/tasks.
4. **Error Aggregation**: `ManagerStatsTracker` collects and reports errors across all collectors, providing a unified view.
5. **Sanitized Error Messages**: `ManagerLogger` ensures sensitive internal details are not leaked in logs or public interfaces.

### Error Reporting

```python
# Service status includes aggregated error information
status = service.get_status()

if status['errors_count'] > 0:
    print(f"Service has {status['errors_count']} errors.")
    print(f"Last service error: {status['last_error']}")

    # Get detailed error information from individual collectors if available
    if 'detailed_collector_statuses' in status:
        for collector_name, details in status['detailed_collector_statuses'].items():
            if details.get('status') == 'ERROR' and 'last_error' in details:
                print(f"Collector {collector_name} error: {details['last_error']}")
```

## Testing

The testing approach now emphasizes unit tests for individual components and integration tests for component interactions, ensuring thorough coverage of the modular architecture.

### Running Service Tests

```bash
# Run all data collection service tests
uv run pytest tests/data/collection_service -v # Assuming tests are in a 'collection_service' subdir

# Run specific component tests, e.g., for ServiceConfig
uv run pytest tests/config/test_service_config.py -v

# Run with coverage for the entire data collection module
uv run pytest --cov=data --cov=config --cov=utils tests/
```

### Test Coverage

The expanded test suite now covers:
- **Component Unit Tests**: Individual tests for `ServiceConfig`, `CollectorFactory`, `AsyncTaskManager`, `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, `ManagerLogger`.
- **Service Integration Tests**: Testing `DataCollectionService`'s orchestration of its components.
- Service initialization and configuration loading/validation.
- Collector orchestration and management via `CollectorManager` and `CollectorLifecycleManager`.
- Asynchronous task management and error recovery.
- Service lifecycle (start/stop/restart) and signal handling.
- Status reporting and monitoring, including detailed collector statuses.
- Error aggregation and recovery strategies.

### Mock Testing

```python
import pytest
from unittest.mock import AsyncMock, patch
from data.collection_service import DataCollectionService
from config.service_config import ServiceConfig # Ensure new components are imported for mocking

@pytest.mark.asyncio
async def test_service_with_mock_components():
    with patch('data.collection_service.ServiceConfig') as MockServiceConfig, \
         patch('data.collection_service.CollectorFactory') as MockCollectorFactory, \
         patch('data.collection_service.CollectorManager') as MockCollectorManager, \
         patch('data.collection_service.AsyncTaskManager') as MockAsyncTaskManager:

        # Configure mocks for successful operation
        MockServiceConfig.return_value.load_config.return_value = {"collectors": []}
        MockServiceConfig.return_value.get_config.return_value = {"collectors": []}
        MockCollectorManager.return_value.start_all.return_value = None
        MockCollectorManager.return_value.stop_all.return_value = None
        MockAsyncTaskManager.return_value.start.return_value = None
        MockAsyncTaskManager.return_value.stop.return_value = None

        service = DataCollectionService(
            service_config=MockServiceConfig.return_value,
            collector_factory=MockCollectorFactory.return_value,
            collector_manager=MockCollectorManager.return_value,
            async_task_manager=MockAsyncTaskManager.return_value
        )
        await service.start()

        # Assertions to ensure components were called correctly
        MockServiceConfig.return_value.load_config.assert_called_once()
        MockCollectorManager.return_value.start_all.assert_called_once()
        MockAsyncTaskManager.return_value.start.assert_called_once()

        await service.stop()
        MockCollectorManager.return_value.stop_all.assert_called_once()
        MockAsyncTaskManager.return_value.stop.assert_called_once()
```

## Production Deployment

### Docker Deployment

```dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY . .

# Install dependencies
RUN pip install uv
RUN uv pip install -r requirements.txt

# Create logs and config directories
RUN mkdir -p logs config

# Copy production configuration
COPY config/production.json config/data_collection.json

# Health check
HEALTHCHECK --interval=60s --timeout=10s --start-period=30s --retries=3 \
  CMD python scripts/health_check.py || exit 1

# Run service
CMD ["python", "scripts/start_data_collection.py", "--config", "config/data_collection.json"]
```

### Kubernetes Deployment

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: data-collection-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: data-collection-service
  template:
    metadata:
      labels:
        app: data-collection-service
    spec:
      containers:
      - name: data-collector
        image: crypto-dashboard/data-collector:latest
        ports:
        - containerPort: 8080
        env:
        - name: POSTGRES_HOST
          value: "postgres-service"
        - name: REDIS_HOST
          value: "redis-service"
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
        - name: logs-volume
          mountPath: /app/logs
        livenessProbe:
          exec:
            command:
            - python
            - scripts/health_check.py
          initialDelaySeconds: 30
          periodSeconds: 60
      volumes:
      - name: config-volume
        configMap:
          name: data-collection-config
      - name: logs-volume
        emptyDir: {}
```

### Systemd Service

```ini
[Unit]
Description=Cryptocurrency Data Collection Service
After=network.target postgres.service redis.service
Requires=postgres.service redis.service

[Service]
Type=simple
User=crypto-collector
Group=crypto-collector
WorkingDirectory=/opt/crypto-dashboard
ExecStart=/usr/bin/python scripts/start_data_collection.py --config config/production.json
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=10
KillMode=mixed
TimeoutStopSec=30

# Environment
Environment=PYTHONPATH=/opt/crypto-dashboard
Environment=LOG_LEVEL=INFO

# Security
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ReadWritePaths=/opt/crypto-dashboard/logs

[Install]
WantedBy=multi-user.target
```

### Environment Configuration

```bash
# Production environment variables
export ENVIRONMENT=production
export POSTGRES_HOST=postgres.internal
export POSTGRES_PORT=5432
export POSTGRES_DB=crypto_dashboard
export POSTGRES_USER=dashboard_user
export POSTGRES_PASSWORD=secure_password
export REDIS_HOST=redis.internal
export REDIS_PORT=6379

# Service configuration
export DATA_COLLECTION_CONFIG=/etc/crypto-dashboard/data_collection.json
export LOG_LEVEL=INFO
export HEALTH_CHECK_INTERVAL=120
```

## Monitoring and Alerting

### Metrics Collection

The service exposes metrics for monitoring systems:

```python
# Service metrics
service_uptime_hours = 24.5
collectors_running = 5
collectors_total = 6
errors_per_hour = 0.2
data_points_processed = 15000
```

### Health Checks

```python
# External health check endpoint
async def health_check():
    service = DataCollectionService()
    status = service.get_status()
    
    if not status['service_running']:
        return {'status': 'unhealthy', 'reason': 'service_stopped'}
    
    if status['collectors_failed'] > status['collectors_total'] * 0.5:
        return {'status': 'degraded', 'reason': 'too_many_failed_collectors'}
    
    return {'status': 'healthy'}
```

### Alerting Rules

```yaml
# Prometheus alerting rules
groups:
- name: data_collection_service
  rules:
  - alert: DataCollectionServiceDown
    expr: up{job="data-collection-service"} == 0
    for: 5m
    annotations:
      summary: "Data collection service is down"
      
  - alert: TooManyFailedCollectors
    expr: collectors_failed / collectors_total > 0.5
    for: 10m
    annotations:
      summary: "More than 50% of collectors have failed"
      
  - alert: HighErrorRate
    expr: rate(errors_total[5m]) > 0.1
    for: 15m
    annotations:
      summary: "High error rate in data collection service"
```

## Performance Considerations

### Resource Usage

- **Memory**: ~150MB base + ~15MB per trading pair (including service overhead)
- **CPU**: Low (async I/O bound, service orchestration)
- **Network**: ~1KB/s per trading pair
- **Storage**: Service logs ~10MB/day

### Scaling Strategies

1. **Horizontal Scaling**: Multiple service instances with different configurations
2. **Configuration Partitioning**: Separate services by exchange or asset class
3. **Load Balancing**: Distribute trading pairs across service instances
4. **Regional Deployment**: Deploy closer to exchange data centers

### Optimization Tips

1. **Configuration Tuning**: Optimize health check intervals and timeframes
2. **Resource Limits**: Set appropriate memory and CPU limits
3. **Batch Operations**: Use efficient database operations
4. **Monitoring Overhead**: Balance monitoring frequency with performance

## Troubleshooting

### Common Service Issues

#### Service Won't Start

```
❌ Failed to start data collection service
```

**Solutions:**
1. Check configuration file validity
2. Verify database connectivity
3. Ensure no port conflicts
4. Check file permissions

#### Configuration Loading Failed

```
❌ Failed to load config from config/data_collection.json: Invalid JSON
```

**Solutions:**
1. Validate JSON syntax
2. Check required fields
3. Verify file encoding (UTF-8)
4. Recreate default configuration

#### No Collectors Created

```
❌ No collectors were successfully initialized
```

**Solutions:**
1. Check exchange configuration
2. Verify trading pair symbols
3. Check network connectivity
4. Review collector creation logs

### Debug Mode

Enable verbose service debugging:

```json
{
  "logging": {
    "level": "DEBUG",
    "log_errors_only": false,
    "verbose_data_logging": true
  }
}
```

### Service Diagnostics

```python
# Run diagnostic check
from data.collection_service import DataCollectionService

service = DataCollectionService()
status = service.get_status()

print(f"Service Running: {status['service_running']}")
print(f"Configuration File: {status['configuration']['config_file']}")
print(f"Collectors: {status['collectors_running']}/{status['collectors_total']}")

# Check individual collector health
for collector_name in service.manager.list_collectors():
    collector_status = service.manager.get_collector_status(collector_name)
    print(f"{collector_name}: {collector_status['status']}")
```

## Related Documentation

- [Data Collectors System](../data_collectors.md) - Comprehensive documentation on core collector components and their modular internal structure.
- [Logging System](../logging.md) - Details on logging configuration and philosophy.
- [Database Operations](../../database/operations.md) - Information on database integration and persistence.
- [Monitoring Guide](../../monitoring/README.md) - Setup for system monitoring and alerting.
- [ADR-004: Modular Data Collector System Refactoring](../../decisions/ADR-004-modular-data-collector-system.md) - Rationale and implications of the modular architecture.