Refactor data collection architecture for modularity and maintainability
- Updated `pyproject.toml` to include the new `data` package in the build configuration, ensuring all components are properly included. - Introduced `ADR-004` documentation outlining the rationale for refactoring the data collection system into a modular architecture, addressing complexity and maintainability issues. - Enhanced `data_collectors.md` to reflect the new component structure, detailing responsibilities of `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, and `ManagerLogger`. - Refactored `DataCollectionService` to utilize the new modular components, improving orchestration and error handling. - Removed the obsolete `collector-service-tasks-optimization.md` and `refactor-common-package.md` files, streamlining the tasks documentation. These changes significantly improve the architecture and maintainability of the data collection service, aligning with project standards for modularity, performance, and documentation clarity.
This commit is contained in:
52
docs/decisions/ADR-004-modular-data-collector-system.md
Normal file
52
docs/decisions/ADR-004-modular-data-collector-system.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# ADR-004: Modular Data Collector System Refactoring
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Previously, the data collection system, primarily `CollectorManager` and `DataCollectionService`, had grown in complexity, exceeding recommended file and function size limits. This led to tight coupling, scattered configuration logic, and a monolithic structure that hindered maintainability, testability, and scalability. Key issues included:
|
||||
|
||||
- Large file sizes for `collector_manager.py` (563 lines) and `collection_service.py` (451 lines).
|
||||
- Functions exceeding the 50-line limit (e.g., `__init__`, `_global_health_monitor`, `get_status` in `CollectorManager`; `_create_default_config`, `run` in `DataCollectionService`).
|
||||
- `CollectorManager` handling too many responsibilities (lifecycle, health monitoring, statistics, logging).
|
||||
- Scattered configuration logic.
|
||||
- Potential memory leaks due to untracked asynchronous tasks.
|
||||
- Inefficient statistics collection.
|
||||
- Gaps in test coverage for state transitions, health monitoring, and concurrent operations.
|
||||
- Lack of comprehensive API and configuration schema documentation.
|
||||
|
||||
## Decision
|
||||
We decided to refactor the data collector system into a modular, component-based architecture. This involves:
|
||||
|
||||
1. **Breaking Down `CollectorManager`**: Extracting specific responsibilities into dedicated component classes:
|
||||
* `CollectorLifecycleManager`: For collector lifecycle operations.
|
||||
* `ManagerHealthMonitor`: For global health monitoring.
|
||||
* `ManagerStatsTracker`: For managing and caching performance statistics.
|
||||
* `ManagerLogger`: For centralizing logging.
|
||||
2. **Modularizing `DataCollectionService`**: Creating specialized components for its concerns:
|
||||
* `ServiceConfig`: For loading, creating, and validating configurations.
|
||||
* `CollectorFactory`: For encapsulating collector creation logic.
|
||||
3. **Introducing `AsyncTaskManager`**: A centralized utility for managing and tracking `asyncio.Task` instances to prevent resource leaks and improve robustness.
|
||||
4. **Enhancing Error Handling and Security**: Implementing a `_sanitize_error` method, adding file permission validation for configurations, and ensuring specific exception handling.
|
||||
5. **Optimizing Performance**: Utilizing `CachedStatsManager` for efficient statistics updates.
|
||||
6. **Improving Documentation and Testing**: Adding comprehensive docstrings, creating new unit test files for components, and enhancing existing tests.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- **Improved Maintainability**: Clear separation of concerns reduces complexity and makes code easier to understand and modify.
|
||||
- **Enhanced Testability**: Individual components can be unit-tested in isolation, leading to more robust and reliable code.
|
||||
- **Increased Scalability**: The modular design allows for easier extension and adaptation to new exchanges or data types.
|
||||
- **Better Readability**: Smaller files and functions improve code comprehension.
|
||||
- **Robust Error Handling**: Dedicated components for error handling and task management lead to more resilient operations.
|
||||
- **Optimized Performance**: Cached statistics and managed asynchronous tasks contribute to better resource utilization.
|
||||
- **Comprehensive Documentation**: Clearer architecture facilitates better documentation and onboarding.
|
||||
|
||||
**Negative:**
|
||||
- **Increased File Count**: More files are introduced due to the breakdown of responsibilities.
|
||||
- **Initial Development Overhead**: The refactoring required a significant upfront investment in time and effort.
|
||||
- **Increased Indirection**: More layers of abstraction might initially make the system seem more complex to new developers unfamiliar with the pattern.
|
||||
|
||||
## Alternatives Considered
|
||||
- **Minor Refactoring**: Only addressing critical violations (e.g., function size) without a full modular redesign. *Rejected* due to not fully addressing underlying architectural issues and long-term maintainability concerns.
|
||||
- **External Libraries for Each Concern**: Using separate, heavy-duty external libraries for logging, health monitoring, etc. *Rejected* to avoid introducing unnecessary dependencies and to maintain more control over custom logic specific to our system.
|
||||
@@ -100,17 +100,49 @@ Manages and dispatches real-time data to registered callbacks.
|
||||
- Notifies all subscribed listeners when new data points are received.
|
||||
- Ensures efficient and reliable distribution of processed market data.
|
||||
|
||||
### 5. `CollectorManager`
|
||||
### 5. `CollectorLifecycleManager`
|
||||
|
||||
A singleton class that manages all active data collectors in the system.
|
||||
Manages the lifecycle of individual data collectors, including adding, removing, starting, stopping, and restarting them.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Centralized `start` and `stop` for all collectors
|
||||
- System-wide status aggregation
|
||||
- Global health monitoring
|
||||
- Coordination of restart policies
|
||||
- Handles `add_collector`, `remove_collector`, `enable_collector`, `disable_collector`.
|
||||
- Manages `_start_collector`, `restart_collector`, `restart_all_collectors` operations.
|
||||
|
||||
### 6. Exchange-Specific Collectors
|
||||
### 6. `ManagerHealthMonitor`
|
||||
|
||||
Encapsulates the logic for global system health monitoring.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Implements the `_global_health_monitor` logic.
|
||||
- Provides system-wide health checks and auto-restart coordination.
|
||||
|
||||
### 7. `ManagerStatsTracker`
|
||||
|
||||
Manages the collection and retrieval of performance statistics for the `CollectorManager`.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Updates and provides statistics via `get_status`.
|
||||
- Utilizes `CachedStatsManager` for optimized, periodic updates of statistics.
|
||||
|
||||
### 8. `ManagerLogger`
|
||||
|
||||
Centralizes all logging operations for the `CollectorManager`.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Provides wrapper methods for logging at different levels (`_log_debug`, `_log_info`, `_log_warning`, `_log_error`, `_log_critical`).
|
||||
- Ensures consistent log formatting and includes `exc_info=True` for error logs.
|
||||
|
||||
### 9. `CollectorManager`
|
||||
|
||||
A singleton class that orchestrates all active data collectors and their associated components. It now delegates specific responsibilities to dedicated component classes.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Centralized control and coordination of `CollectorLifecycleManager`.
|
||||
- Aggregation of system-wide status from `ManagerHealthMonitor` and `ManagerStatsTracker`.
|
||||
- Unified logging through `ManagerLogger`.
|
||||
- Overall system-wide status aggregation.
|
||||
|
||||
### 10. Exchange-Specific Collectors
|
||||
|
||||
Concrete implementations of `BaseDataCollector` for each exchange (e.g., `OKXCollector`).
|
||||
|
||||
@@ -122,6 +154,29 @@ Concrete implementations of `BaseDataCollector` for each exchange (e.g., `OKXCol
|
||||
|
||||
For more details, see [OKX Collector Documentation (`./exchanges/okx.md`)](./exchanges/okx.md).
|
||||
|
||||
### 11. `ServiceConfig`
|
||||
|
||||
Handles the loading, creation, and validation of service configurations.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Manages `_load_config` and `_create_default_config` logic.
|
||||
- Implements schema validation for configuration files and file permission validation.
|
||||
|
||||
### 12. `CollectorFactory`
|
||||
|
||||
Encapsulates the logic for creating individual data collector instances.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Manages the `_create_collector` logic, decoupling collector creation from the `DataCollectionService`.
|
||||
|
||||
### 13. `AsyncTaskManager`
|
||||
|
||||
Provides a comprehensive utility for managing and tracking asynchronous tasks.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Manages `asyncio.Task` instances, preventing potential memory leaks and ensuring proper task lifecycle.
|
||||
- Used by `CollectorManager` and `DataCollectionService` for robust asynchronous operations.
|
||||
|
||||
## Exchange Factory
|
||||
|
||||
The `ExchangeFactory` provides a standardized way to create data collectors, decoupling the client code from specific implementations.
|
||||
@@ -162,7 +217,7 @@ collectors = ExchangeFactory.create_multiple_collectors(configs)
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
The system includes a robust, two-level health monitoring system.
|
||||
The system includes a robust, two-level health monitoring system, now enhanced with cached statistics management.
|
||||
|
||||
### 1. Collector-Level Monitoring
|
||||
|
||||
@@ -176,13 +231,13 @@ Each `BaseDataCollector` instance has its own health monitoring.
|
||||
|
||||
### 2. Manager-Level Monitoring
|
||||
|
||||
The `CollectorManager` provides a global view of system health.
|
||||
The `CollectorManager` provides a global view of system health, leveraging `ManagerHealthMonitor` and `ManagerStatsTracker`.
|
||||
|
||||
**Key Metrics:**
|
||||
- **Aggregate Status**: Overview of all collectors (running, stopped, failed)
|
||||
- **System Uptime**: Total uptime for the collector system
|
||||
- **Failed Collectors**: List of collectors that failed to restart
|
||||
- **Resource Usage**: (Future) System-level CPU and memory monitoring
|
||||
- **Resource Usage**: System-level CPU and memory monitoring
|
||||
|
||||
### Health Status API
|
||||
|
||||
@@ -208,16 +263,61 @@ For detailed status schemas, refer to the [Reference Documentation (`../../refer
|
||||
|
||||
### `CollectorManager`
|
||||
- `add_collector(collector)`
|
||||
- `remove_collector(collector_id)`
|
||||
- `enable_collector(collector_id)`
|
||||
- `disable_collector(collector_id)`
|
||||
- `restart_collector(collector_id)`
|
||||
- `async start_all()`
|
||||
- `async stop_all()`
|
||||
- `get_status() -> dict`
|
||||
- `list_collectors() -> list`
|
||||
|
||||
### `DataCollectionService`
|
||||
- `async run()`
|
||||
- `async stop()`
|
||||
|
||||
### `ExchangeFactory`
|
||||
- `create_collector(config) -> BaseDataCollector`
|
||||
- `create_multiple_collectors(configs) -> list`
|
||||
- `get_supported_exchanges() -> list`
|
||||
|
||||
### `CollectorLifecycleManager`
|
||||
- `add_collector(collector)`
|
||||
- `remove_collector(collector_id)`
|
||||
- `enable_collector(collector_id)`
|
||||
- `disable_collector(collector_id)`
|
||||
- `_start_collector(collector)`
|
||||
- `restart_collector(collector_id)`
|
||||
- `restart_all_collectors()`
|
||||
|
||||
### `ManagerHealthMonitor`
|
||||
- `_global_health_monitor()`
|
||||
|
||||
### `ManagerStatsTracker`
|
||||
- `get_status() -> dict`
|
||||
- `_update_stats()`
|
||||
|
||||
### `ManagerLogger`
|
||||
- `_log_debug(message, exc_info=False)`
|
||||
- `_log_info(message, exc_info=False)`
|
||||
- `_log_warning(message, exc_info=False)`
|
||||
- `_log_error(message, exc_info=True)`
|
||||
- `_log_critical(message, exc_info=True)`
|
||||
|
||||
### `ServiceConfig`
|
||||
- `_load_config(config_path)`
|
||||
- `_create_default_config()`
|
||||
- `validate_permissions(file_path)`
|
||||
|
||||
### `CollectorFactory`
|
||||
- `_create_collector(config)`
|
||||
|
||||
### `AsyncTaskManager`
|
||||
- `add_task(task)`
|
||||
- `remove_task(task_id)`
|
||||
- `cancel_all_tasks()`
|
||||
- `wait_for_all_tasks()`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
@@ -234,12 +334,21 @@ For detailed status schemas, refer to the [Reference Documentation (`../../refer
|
||||
- **Cause**: Trying to create a collector for an exchange not registered in the factory.
|
||||
- **Solution**: Implement the collector and register it in `data/exchanges/__init__.py`.
|
||||
|
||||
4. **Error information leakage in logs/responses**
|
||||
- **Cause**: Raw exception details being exposed.
|
||||
- **Solution**: Ensure error messages are sanitized using `_sanitize_error` before logging or returning to external calls.
|
||||
|
||||
5. **Configuration file permission issues**
|
||||
- **Cause**: Improper file permissions preventing the service from reading configuration.
|
||||
- **Solution**: Verify file permissions for configuration files. The `ServiceConfig` now includes validation for this.
|
||||
|
||||
### Best Practices
|
||||
|
||||
- Use the `CollectorManager` for lifecycle management.
|
||||
- Always validate configurations before creating collectors.
|
||||
- Use the `CollectorManager` for lifecycle management, delegating to its components.
|
||||
- Always validate configurations before creating collectors, leveraging `ServiceConfig`.
|
||||
- Monitor system status regularly using `manager.get_status()`.
|
||||
- Refer to logs for detailed error analysis.
|
||||
- Refer to logs for detailed error analysis, paying attention to `exc_info=True` for critical errors.
|
||||
- Ensure `AsyncTaskManager` is used for all long-running asynchronous operations to prevent resource leaks.
|
||||
|
||||
---
|
||||
*Back to [Modules Documentation (`../README.md`)]*
|
||||
@@ -4,23 +4,61 @@
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
The data collection service uses a **manager-worker architecture** to collect data for multiple trading pairs concurrently.
|
||||
The data collection service has been refactored into a **modular, component-based architecture** to collect data for multiple trading pairs concurrently with improved maintainability, scalability, and testability.
|
||||
|
||||
- **`CollectorManager`**: The central manager responsible for creating, starting, stopping, and monitoring individual data collectors.
|
||||
- **`OKXCollector`**: A dedicated worker responsible for collecting data for a single trading pair from the OKX exchange.
|
||||
- **`DataCollectionService`**: The primary orchestration layer, responsible for initializing and coordinating core service components. It delegates specific functionalities to dedicated managers and factories.
|
||||
- **`CollectorManager`**: Now acts as an orchestrator for individual data collectors, utilizing its own set of internal components (e.g., `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, `ManagerLogger`).
|
||||
- **Dedicated Components**: Specific concerns like configuration, collector creation, and asynchronous task management are handled by new, specialized classes (`ServiceConfig`, `CollectorFactory`, `AsyncTaskManager`).
|
||||
- **`OKXCollector`**: A dedicated worker responsible for collecting data for a single trading pair from the OKX exchange, now built upon a more robust `BaseDataCollector` and its internal components (`ConnectionManager`, `CollectorStateAndTelemetry`, `CallbackDispatcher`).
|
||||
|
||||
This architecture allows for high scalability and fault tolerance.
|
||||
This modular architecture allows for high scalability, fault tolerance, and clear separation of concerns.
|
||||
|
||||
## Key Components
|
||||
|
||||
### `DataCollectionService`
|
||||
|
||||
- **Location**: `data/collection_service.py`
|
||||
- **Responsibilities**:
|
||||
- Orchestrates the overall data collection process.
|
||||
- Initializes and coordinates `ServiceConfig`, `CollectorFactory`, `CollectorManager`, and `AsyncTaskManager`.
|
||||
- Manages the main service loop and graceful shutdown.
|
||||
- Provides a high-level API for running and monitoring the service.
|
||||
|
||||
### `ServiceConfig`
|
||||
|
||||
- **Location**: `config/service_config.py`
|
||||
- **Responsibilities**:
|
||||
- Handles loading, creating, and validating service configurations.
|
||||
- Ensures configuration file integrity, including file permission validation.
|
||||
- Manages default configuration generation and runtime updates.
|
||||
|
||||
### `CollectorFactory`
|
||||
|
||||
- **Location**: `data/collector_factory.py`
|
||||
- **Responsibilities**:
|
||||
- Encapsulates the logic for creating individual data collector instances (e.g., `OKXCollector`).
|
||||
- Decouples collector instantiation from the `DataCollectionService`.
|
||||
- Ensures collectors are created with correct configurations and dependencies.
|
||||
|
||||
### `AsyncTaskManager`
|
||||
|
||||
- **Location**: `utils/async_task_manager.py`
|
||||
- **Responsibilities**:
|
||||
- Manages and tracks `asyncio.Task` instances throughout the application.
|
||||
- Prevents potential memory leaks by ensuring proper task lifecycle management.
|
||||
- Facilitates robust asynchronous operations for both `DataCollectionService` and `CollectorManager`.
|
||||
|
||||
### `CollectorManager`
|
||||
|
||||
- **Location**: `tasks/collector_manager.py`
|
||||
- **Location**: `data/collector_manager.py`
|
||||
- **Responsibilities**:
|
||||
- Manages the lifecycle of multiple collectors
|
||||
- Provides a unified API for controlling all collectors
|
||||
- Monitors the health of each collector
|
||||
- Distributes tasks and aggregates results
|
||||
- Acts as an orchestrator for all active data collectors.
|
||||
- Delegates specific responsibilities to its new internal components:
|
||||
- `CollectorLifecycleManager`: Manages adding, removing, starting, and stopping collectors.
|
||||
- `ManagerHealthMonitor`: Encapsulates global health monitoring and auto-restart logic.
|
||||
- `ManagerStatsTracker`: Handles performance statistics collection and caching.
|
||||
- `ManagerLogger`: Centralizes logging operations for the manager and its collectors.
|
||||
- Provides a unified interface for controlling and monitoring managed collectors.
|
||||
|
||||
### `OKXCollector`
|
||||
|
||||
@@ -67,15 +105,20 @@ The service is configured through `config/bot_configs/data_collector_config.json
|
||||
|
||||
## Usage
|
||||
|
||||
Start the service from the main application entry point:
|
||||
The `DataCollectionService` is the main entry point for running the data collection system.
|
||||
|
||||
Start the service from a script (e.g., `scripts/start_data_collection.py`):
|
||||
|
||||
```python
|
||||
# main.py
|
||||
from tasks.collector_manager import CollectorManager
|
||||
# scripts/start_data_collection.py
|
||||
import asyncio
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging # Assuming this exists or is created
|
||||
|
||||
async def main():
|
||||
manager = CollectorManager()
|
||||
await manager.start_all_collectors()
|
||||
setup_logging() # Initialize logging
|
||||
service = DataCollectionService(config_path="config/data_collection.json")
|
||||
await service.run() # Or run with a duration: await service.run(duration_hours=24)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -83,7 +126,7 @@ if __name__ == "__main__":
|
||||
|
||||
## Health & Monitoring
|
||||
|
||||
The `CollectorManager` provides a `get_status()` method to monitor the health of all collectors.
|
||||
The `DataCollectionService` and `CollectorManager` provide comprehensive health and monitoring capabilities through their dedicated components.
|
||||
|
||||
## Features
|
||||
|
||||
@@ -196,42 +239,78 @@ The service uses JSON configuration files with automatic default creation if non
|
||||
|
||||
### Service Layer Components
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ DataCollectionService │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ Configuration Manager │ │
|
||||
│ │ • JSON config loading/validation │ │
|
||||
│ │ • Default config generation │ │
|
||||
│ │ • Runtime config updates │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ Service Monitor │ │
|
||||
│ │ • Service-level health checks │ │
|
||||
│ │ • Uptime tracking │ │
|
||||
│ │ • Error aggregation │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ CollectorManager │ │
|
||||
│ │ • Individual collector management │ │
|
||||
│ │ • Health monitoring │ │
|
||||
│ │ • Auto-restart coordination │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────┐
|
||||
│ Core Data Collectors │
|
||||
│ (See data_collectors.md) │
|
||||
└─────────────────────────────┘
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph DataCollectionService
|
||||
SC[ServiceConfig] -- Manages --> Conf(Configuration)
|
||||
SCF[CollectorFactory] -- Creates --> Collectors(Data Collectors)
|
||||
ATM[AsyncTaskManager] -- Manages --> Tasks(Async Tasks)
|
||||
DCS[DataCollectionService] -- Uses --> SC
|
||||
DCS -- Uses --> SCF
|
||||
DCS -- Uses --> ATM
|
||||
DCS -- Orchestrates --> CM(CollectorManager)
|
||||
end
|
||||
|
||||
subgraph CollectorManager
|
||||
CM --> CLM(CollectorLifecycleManager)
|
||||
CM --> MHM(ManagerHealthMonitor)
|
||||
CM --> MST(ManagerStatsTracker)
|
||||
CM --> ML(ManagerLogger)
|
||||
CLM -- Manages --> BC[BaseDataCollector]
|
||||
MHM -- Monitors --> BC
|
||||
MST -- Tracks --> BC
|
||||
ML -- Logs For --> BC
|
||||
end
|
||||
|
||||
subgraph BaseDataCollector (Core Data Collector)
|
||||
BC --> ConM(ConnectionManager)
|
||||
BC --> CST(CollectorStateAndTelemetry)
|
||||
BC --> CD(CallbackDispatcher)
|
||||
end
|
||||
|
||||
Conf -- Provides --> DCS
|
||||
Collectors -- Created By --> SCF
|
||||
Tasks -- Managed By --> ATM
|
||||
CM -- Manages --> BaseDataCollector
|
||||
BaseDataCollector -- Collects Data --> Database
|
||||
BaseDataCollector -- Publishes Data --> Redis(Redis Pub/Sub)
|
||||
|
||||
style DCS fill:#f9f,stroke:#333,stroke-width:2px
|
||||
style CM fill:#bbf,stroke:#333,stroke-width:2px
|
||||
style BC fill:#cfc,stroke:#333,stroke-width:2px
|
||||
style SC fill:#FFD700,stroke:#333,stroke-width:1px
|
||||
style SCF fill:#90EE90,stroke:#333,stroke-width:1px
|
||||
style ATM fill:#ADD8E6,stroke:#333,stroke-width:1px
|
||||
style CLM fill:#FFC0CB,stroke:#333,stroke-width:1px
|
||||
style MHM fill:#C0C0C0,stroke:#333,stroke-width:1px
|
||||
style MST fill:#DA70D6,stroke:#333,stroke-width:1px
|
||||
style ML fill:#DDA0DD,stroke:#333,stroke-width:1px
|
||||
style ConM fill:#F0F8FF,stroke:#333,stroke-width:1px
|
||||
style CST fill:#FFE4E1,stroke:#333,stroke-width:1px
|
||||
style CD fill:#FAFAD2,stroke:#333,stroke-width:1px
|
||||
style DB fill:#A9A9A9,stroke:#333,stroke-width:1px
|
||||
style Redis fill:#FF6347,stroke:#333,stroke-width:1px
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Configuration → Service → CollectorManager → Data Collectors → Database
|
||||
↓ ↓
|
||||
Service Monitor Health Monitor
|
||||
```mermaid
|
||||
graph LR
|
||||
Config(Configuration) --> ServiceConfig
|
||||
ServiceConfig --> DataCollectionService
|
||||
DataCollectionService -- Initializes --> CollectorManager
|
||||
DataCollectionService -- Initializes --> CollectorFactory
|
||||
DataCollectionService -- Initializes --> AsyncTaskManager
|
||||
CollectorFactory -- Creates --> BaseDataCollector
|
||||
CollectorManager -- Manages --> BaseDataCollector
|
||||
BaseDataCollector -- Collects Data --> Database
|
||||
BaseDataCollector -- Publishes Data --> RedisPubSub(Redis Pub/Sub)
|
||||
HealthMonitor(Health Monitoring) --> DataCollectionService
|
||||
HealthMonitor --> CollectorManager
|
||||
HealthMonitor --> BaseDataCollector
|
||||
ErrorHandling(Error Handling) --> DataCollectionService
|
||||
ErrorHandling --> CollectorManager
|
||||
ErrorHandling --> BaseDataCollector
|
||||
```
|
||||
|
||||
### Storage Integration
|
||||
@@ -283,49 +362,69 @@ The service implements **clean production logging** focused on operational needs
|
||||
|
||||
### DataCollectionService
|
||||
|
||||
The main service class for managing data collection operations.
|
||||
The main service class for managing data collection operations, now orchestrating through specialized components.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
DataCollectionService(config_path: str = "config/data_collection.json")
|
||||
DataCollectionService(
|
||||
config_path: str = "config/data_collection.json",
|
||||
service_config: Optional[ServiceConfig] = None,
|
||||
collector_factory: Optional[CollectorFactory] = None,
|
||||
collector_manager: Optional[CollectorManager] = None,
|
||||
async_task_manager: Optional[AsyncTaskManager] = None
|
||||
)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `config_path`: Path to JSON configuration file
|
||||
- `config_path`: Path to JSON configuration file. Used if `service_config` is not provided.
|
||||
- `service_config`: An instance of `ServiceConfig`. If None, one will be created.
|
||||
- `collector_factory`: An instance of `CollectorFactory`. If None, one will be created.
|
||||
- `collector_manager`: An instance of `CollectorManager`. If None, one will be created.
|
||||
- `async_task_manager`: An instance of `AsyncTaskManager`. If None, one will be created.
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `async run(duration_hours: Optional[float] = None) -> bool`
|
||||
##### `async run(duration_hours: Optional[float] = None) -> None`
|
||||
|
||||
Run the service for a specified duration or indefinitely.
|
||||
Runs the service for a specified duration or indefinitely. This method now coordinates the main event loop and lifecycle of all internal components.
|
||||
|
||||
**Parameters:**
|
||||
- `duration_hours`: Optional duration in hours (None = indefinite)
|
||||
- `duration_hours`: Optional duration in hours (None = indefinite).
|
||||
|
||||
**Returns:**
|
||||
- `bool`: True if successful, False if error occurred
|
||||
- `None`
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
service = DataCollectionService()
|
||||
await service.run(duration_hours=24) # Run for 24 hours
|
||||
from data.collection_service import DataCollectionService
|
||||
import asyncio
|
||||
|
||||
async def run_service():
|
||||
service = DataCollectionService()
|
||||
await service.run(duration_hours=24) # Run for 24 hours
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(run_service())
|
||||
```
|
||||
|
||||
##### `async start() -> bool`
|
||||
##### `async start() -> None`
|
||||
|
||||
Start the data collection service and all configured collectors.
|
||||
Initializes and starts the data collection service and all configured collectors. This method delegates to internal components for their respective startup procedures.
|
||||
|
||||
**Returns:**
|
||||
- `bool`: True if started successfully
|
||||
- `None`
|
||||
|
||||
##### `async stop() -> None`
|
||||
|
||||
Stop the service gracefully, including all collectors and cleanup.
|
||||
Stops the service gracefully, including all collectors and internal cleanup. Ensures all asynchronous tasks are properly cancelled and resources released.
|
||||
|
||||
**Returns:**
|
||||
- `None`
|
||||
|
||||
##### `get_status() -> Dict[str, Any]`
|
||||
|
||||
Get current service status including uptime, collector counts, and errors.
|
||||
Gets current service status, including uptime, collector counts, and errors, aggregated from underlying components.
|
||||
|
||||
**Returns:**
|
||||
```python
|
||||
@@ -341,23 +440,23 @@ Get current service status including uptime, collector counts, and errors.
|
||||
'config_file': 'config/data_collection.json',
|
||||
'exchanges_enabled': ['okx'],
|
||||
'total_trading_pairs': 6
|
||||
},
|
||||
'detailed_collector_statuses': { # New field for detailed statuses
|
||||
'okx_BTC-USDT': {'status': 'RUNNING', 'health_score': 95},
|
||||
'okx_ETH-USDT': {'status': 'ERROR', 'last_error': 'Connection refused'}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
##### `async initialize_collectors() -> bool`
|
||||
##### `_run_main_loop(duration_hours: Optional[float])`
|
||||
|
||||
Initialize all collectors based on configuration.
|
||||
Internal method extracted from `run()` to manage the core asynchronous loop.
|
||||
|
||||
**Parameters:**
|
||||
- `duration_hours`: Optional duration in hours for the loop.
|
||||
|
||||
**Returns:**
|
||||
- `bool`: True if all collectors initialized successfully
|
||||
|
||||
##### `load_configuration() -> Dict[str, Any]`
|
||||
|
||||
Load and validate configuration from file.
|
||||
|
||||
**Returns:**
|
||||
- `dict`: Loaded configuration
|
||||
- `None`
|
||||
|
||||
### Standalone Function
|
||||
|
||||
@@ -367,17 +466,17 @@ Load and validate configuration from file.
|
||||
async def run_data_collection_service(
|
||||
config_path: str = "config/data_collection.json",
|
||||
duration_hours: Optional[float] = None
|
||||
) -> bool
|
||||
) -> None
|
||||
```
|
||||
|
||||
Convenience function to run the service with minimal setup.
|
||||
Convenience function to run the service with minimal setup, internally creating a `DataCollectionService` instance.
|
||||
|
||||
**Parameters:**
|
||||
- `config_path`: Path to configuration file
|
||||
- `duration_hours`: Optional duration in hours
|
||||
- `config_path`: Path to configuration file.
|
||||
- `duration_hours`: Optional duration in hours.
|
||||
|
||||
**Returns:**
|
||||
- `bool`: True if successful
|
||||
- `None`
|
||||
|
||||
## Integration Examples
|
||||
|
||||
@@ -386,15 +485,16 @@ Convenience function to run the service with minimal setup.
|
||||
```python
|
||||
import asyncio
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging # Assuming this exists or is created
|
||||
|
||||
async def main():
|
||||
setup_logging()
|
||||
service = DataCollectionService("config/my_config.json")
|
||||
|
||||
|
||||
# Run for 24 hours
|
||||
success = await service.run(duration_hours=24)
|
||||
|
||||
if not success:
|
||||
print("Service encountered errors")
|
||||
await service.run(duration_hours=24)
|
||||
|
||||
print("Service run finished.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@@ -405,23 +505,34 @@ if __name__ == "__main__":
|
||||
```python
|
||||
import asyncio
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging
|
||||
|
||||
async def monitor_service():
|
||||
setup_logging()
|
||||
service = DataCollectionService()
|
||||
|
||||
|
||||
# Start service in background
|
||||
start_task = asyncio.create_task(service.run())
|
||||
|
||||
# Monitor status every 5 minutes
|
||||
while service.running:
|
||||
status = service.get_status()
|
||||
print(f"Service Uptime: {status['uptime_hours']:.1f}h")
|
||||
print(f"Collectors: {status['collectors_running']}/{status['collectors_total']}")
|
||||
print(f"Errors: {status['errors_count']}")
|
||||
|
||||
await asyncio.sleep(300) # 5 minutes
|
||||
|
||||
await start_task
|
||||
|
||||
# Monitor status every 60 seconds
|
||||
try:
|
||||
while True:
|
||||
status = service.get_status()
|
||||
print(f"Service Uptime: {status['uptime_hours']:.1f}h")
|
||||
print(f"Collectors: {status['collectors_running']}/{status['collectors_total']}")
|
||||
print(f"Errors: {status['errors_count']}")
|
||||
if status['errors_count'] > 0:
|
||||
print(f"Last error: {status['last_error']}")
|
||||
print("Detailed Collector Statuses:")
|
||||
for name, details in status.get('detailed_collector_statuses', {}).items():
|
||||
print(f" - {name}: Status={details.get('status')}, Health Score={details.get('health_score')}")
|
||||
|
||||
await asyncio.sleep(60)
|
||||
except asyncio.CancelledError:
|
||||
print("Monitoring cancelled.")
|
||||
finally:
|
||||
await service.stop()
|
||||
await start_task # Ensure the main service task is awaited
|
||||
|
||||
asyncio.run(monitor_service())
|
||||
```
|
||||
@@ -431,32 +542,40 @@ asyncio.run(monitor_service())
|
||||
```python
|
||||
import asyncio
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging
|
||||
|
||||
async def controlled_collection():
|
||||
setup_logging()
|
||||
service = DataCollectionService()
|
||||
|
||||
|
||||
try:
|
||||
# Initialize and start
|
||||
await service.initialize_collectors()
|
||||
# Start the service
|
||||
await service.start()
|
||||
|
||||
print("Data collection service started.")
|
||||
|
||||
# Monitor and control
|
||||
while True:
|
||||
status = service.get_status()
|
||||
|
||||
# Check if any collectors failed
|
||||
if status['collectors_failed'] > 0:
|
||||
print("Some collectors failed, checking health...")
|
||||
# Service auto-restart will handle this
|
||||
|
||||
await asyncio.sleep(60) # Check every minute
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("Shutting down service...")
|
||||
finally:
|
||||
await service.stop()
|
||||
print(f"Current Service Status: {status['service_running']}, Collectors Running: {status['collectors_running']}")
|
||||
|
||||
asyncio.run(controlled_collection())
|
||||
# Example: Stop if certain condition met (e.g., specific error, or after a duration)
|
||||
if status['collectors_failed'] > 0:
|
||||
print("Some collectors failed, service is recovering...")
|
||||
# The service's internal health monitor and task manager will handle restarts
|
||||
# For demonstration, stop after 5 minutes
|
||||
await asyncio.sleep(300)
|
||||
print("Stopping service after 5 minutes of operation.")
|
||||
break
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("Manual shutdown requested.")
|
||||
finally:
|
||||
print("Shutting down service gracefully...")
|
||||
await service.stop()
|
||||
print("Service stopped.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(controlled_collection())
|
||||
```
|
||||
|
||||
### Configuration Management
|
||||
@@ -465,91 +584,113 @@ asyncio.run(controlled_collection())
|
||||
import asyncio
|
||||
import json
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging
|
||||
from config.service_config import ServiceConfig # Import the new ServiceConfig
|
||||
|
||||
async def dynamic_configuration():
|
||||
service = DataCollectionService()
|
||||
|
||||
setup_logging()
|
||||
# Instantiate ServiceConfig directly or let DataCollectionService create it
|
||||
service_config_instance = ServiceConfig(config_path="config/data_collection.json")
|
||||
service = DataCollectionService(service_config=service_config_instance)
|
||||
|
||||
print("Initial configuration loaded:")
|
||||
print(json.dumps(service_config_instance.get_config(), indent=2))
|
||||
|
||||
# Load and modify configuration
|
||||
config = service.load_configuration()
|
||||
|
||||
# Add new trading pair
|
||||
config['exchanges']['okx']['trading_pairs'].append({
|
||||
config = service_config_instance.get_config()
|
||||
|
||||
# Add new trading pair if not already present
|
||||
new_pair = {
|
||||
'symbol': 'SOL-USDT',
|
||||
'enabled': True,
|
||||
'data_types': ['trade'],
|
||||
'timeframes': ['1m', '5m']
|
||||
})
|
||||
|
||||
}
|
||||
if new_pair not in config['exchanges']['okx']['trading_pairs']:
|
||||
config['exchanges']['okx']['trading_pairs'].append(new_pair)
|
||||
print("Added SOL-USDT to configuration.")
|
||||
else:
|
||||
print("SOL-USDT already in configuration.")
|
||||
|
||||
# Save updated configuration
|
||||
with open('config/data_collection.json', 'w') as f:
|
||||
json.dump(config, f, indent=2)
|
||||
|
||||
# Restart service with new config
|
||||
service_config_instance.save_config(config) # Use ServiceConfig to save
|
||||
|
||||
print("Updated configuration saved. Restarting service with new config...")
|
||||
await service.stop()
|
||||
await service.start()
|
||||
print("Service restarted with updated configuration.")
|
||||
|
||||
asyncio.run(dynamic_configuration())
|
||||
# Verify new pair is active (logic would be in get_status or similar)
|
||||
status = service.get_status()
|
||||
print(f"Current active collectors count: {status['collectors_total']}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(dynamic_configuration())
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The service implements robust error handling at the service orchestration level:
|
||||
The service implements robust error handling at multiple layers, leveraging the new component structure for more precise error management and recovery.
|
||||
|
||||
### Service Level Errors
|
||||
|
||||
- **Configuration Errors**: Invalid JSON, missing required fields
|
||||
- **Initialization Errors**: Failed collector creation, database connectivity
|
||||
- **Runtime Errors**: Service-level exceptions, resource exhaustion
|
||||
- **Configuration Errors**: Invalid JSON, missing required fields, file permission issues (handled by `ServiceConfig`).
|
||||
- **Initialization Errors**: Failed collector creation (handled by `CollectorFactory`), database connectivity.
|
||||
- **Runtime Errors**: Service-level exceptions, resource exhaustion, unhandled exceptions in asynchronous tasks (managed by `AsyncTaskManager`).
|
||||
|
||||
### Error Recovery Strategies
|
||||
|
||||
1. **Graceful Degradation**: Continue with healthy collectors
|
||||
2. **Configuration Validation**: Validate before applying changes
|
||||
3. **Service Restart**: Full service restart on critical errors
|
||||
4. **Error Aggregation**: Collect and report errors across all collectors
|
||||
1. **Graceful Degradation**: Continue with healthy collectors while attempting to recover failed ones.
|
||||
2. **Configuration Validation**: `ServiceConfig` validates configurations before application, preventing common startup issues.
|
||||
3. **Automated Restarts**: `ManagerHealthMonitor` and `AsyncTaskManager` coordinate automatic restarts for failed collectors/tasks.
|
||||
4. **Error Aggregation**: `ManagerStatsTracker` collects and reports errors across all collectors, providing a unified view.
|
||||
5. **Sanitized Error Messages**: `ManagerLogger` ensures sensitive internal details are not leaked in logs or public interfaces.
|
||||
|
||||
### Error Reporting
|
||||
|
||||
```python
|
||||
# Service status includes error information
|
||||
# Service status includes aggregated error information
|
||||
status = service.get_status()
|
||||
|
||||
if status['errors_count'] > 0:
|
||||
print(f"Service has {status['errors_count']} errors")
|
||||
print(f"Last error: {status['last_error']}")
|
||||
|
||||
# Get detailed error information from collectors
|
||||
for collector_name in service.manager.list_collectors():
|
||||
collector_status = service.manager.get_collector_status(collector_name)
|
||||
if collector_status['status'] == 'error':
|
||||
print(f"Collector {collector_name}: {collector_status['statistics']['last_error']}")
|
||||
print(f"Service has {status['errors_count']} errors.")
|
||||
print(f"Last service error: {status['last_error']}")
|
||||
|
||||
# Get detailed error information from individual collectors if available
|
||||
if 'detailed_collector_statuses' in status:
|
||||
for collector_name, details in status['detailed_collector_statuses'].items():
|
||||
if details.get('status') == 'ERROR' and 'last_error' in details:
|
||||
print(f"Collector {collector_name} error: {details['last_error']}")
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
The testing approach now emphasizes unit tests for individual components and integration tests for component interactions, ensuring thorough coverage of the modular architecture.
|
||||
|
||||
### Running Service Tests
|
||||
|
||||
```bash
|
||||
# Run all data collection service tests
|
||||
uv run pytest tests/test_data_collection_service.py -v
|
||||
uv run pytest tests/data/collection_service -v # Assuming tests are in a 'collection_service' subdir
|
||||
|
||||
# Run specific test categories
|
||||
uv run pytest tests/test_data_collection_service.py::TestDataCollectionService -v
|
||||
# Run specific component tests, e.g., for ServiceConfig
|
||||
uv run pytest tests/config/test_service_config.py -v
|
||||
|
||||
# Run with coverage
|
||||
uv run pytest tests/test_data_collection_service.py --cov=data.collection_service
|
||||
# Run with coverage for the entire data collection module
|
||||
uv run pytest --cov=data --cov=config --cov=utils tests/
|
||||
```
|
||||
|
||||
### Test Coverage
|
||||
|
||||
The service test suite covers:
|
||||
- Service initialization and configuration loading
|
||||
- Collector orchestration and management
|
||||
- Service lifecycle (start/stop/restart)
|
||||
- Configuration validation and error handling
|
||||
- Signal handling and graceful shutdown
|
||||
- Status reporting and monitoring
|
||||
- Error aggregation and recovery
|
||||
The expanded test suite now covers:
|
||||
- **Component Unit Tests**: Individual tests for `ServiceConfig`, `CollectorFactory`, `AsyncTaskManager`, `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, `ManagerLogger`.
|
||||
- **Service Integration Tests**: Testing `DataCollectionService`'s orchestration of its components.
|
||||
- Service initialization and configuration loading/validation.
|
||||
- Collector orchestration and management via `CollectorManager` and `CollectorLifecycleManager`.
|
||||
- Asynchronous task management and error recovery.
|
||||
- Service lifecycle (start/stop/restart) and signal handling.
|
||||
- Status reporting and monitoring, including detailed collector statuses.
|
||||
- Error aggregation and recovery strategies.
|
||||
|
||||
### Mock Testing
|
||||
|
||||
@@ -557,18 +698,39 @@ The service test suite covers:
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, patch
|
||||
from data.collection_service import DataCollectionService
|
||||
from config.service_config import ServiceConfig # Ensure new components are imported for mocking
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_service_with_mock_collectors():
|
||||
with patch('data.collection_service.CollectorManager') as mock_manager:
|
||||
# Mock successful initialization
|
||||
mock_manager.return_value.start.return_value = True
|
||||
|
||||
service = DataCollectionService()
|
||||
result = await service.start()
|
||||
|
||||
assert result is True
|
||||
mock_manager.return_value.start.assert_called_once()
|
||||
async def test_service_with_mock_components():
|
||||
with patch('data.collection_service.ServiceConfig') as MockServiceConfig, \
|
||||
patch('data.collection_service.CollectorFactory') as MockCollectorFactory, \
|
||||
patch('data.collection_service.CollectorManager') as MockCollectorManager, \
|
||||
patch('data.collection_service.AsyncTaskManager') as MockAsyncTaskManager:
|
||||
|
||||
# Configure mocks for successful operation
|
||||
MockServiceConfig.return_value.load_config.return_value = {"collectors": []}
|
||||
MockServiceConfig.return_value.get_config.return_value = {"collectors": []}
|
||||
MockCollectorManager.return_value.start_all.return_value = None
|
||||
MockCollectorManager.return_value.stop_all.return_value = None
|
||||
MockAsyncTaskManager.return_value.start.return_value = None
|
||||
MockAsyncTaskManager.return_value.stop.return_value = None
|
||||
|
||||
service = DataCollectionService(
|
||||
service_config=MockServiceConfig.return_value,
|
||||
collector_factory=MockCollectorFactory.return_value,
|
||||
collector_manager=MockCollectorManager.return_value,
|
||||
async_task_manager=MockAsyncTaskManager.return_value
|
||||
)
|
||||
await service.start()
|
||||
|
||||
# Assertions to ensure components were called correctly
|
||||
MockServiceConfig.return_value.load_config.assert_called_once()
|
||||
MockCollectorManager.return_value.start_all.assert_called_once()
|
||||
MockAsyncTaskManager.return_value.start.assert_called_once()
|
||||
|
||||
await service.stop()
|
||||
MockCollectorManager.return_value.stop_all.assert_called_once()
|
||||
MockAsyncTaskManager.return_value.stop.assert_called_once()
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
@@ -855,7 +1017,8 @@ for collector_name in service.manager.list_collectors():
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Data Collectors System](../components/data_collectors.md) - Core collector components
|
||||
- [Logging System](../components/logging.md) - Logging configuration
|
||||
- [Database Operations](../database/operations.md) - Database integration
|
||||
- [Monitoring Guide](../monitoring/README.md) - System monitoring setup
|
||||
- [Data Collectors System](../data_collectors.md) - Comprehensive documentation on core collector components and their modular internal structure.
|
||||
- [Logging System](../logging.md) - Details on logging configuration and philosophy.
|
||||
- [Database Operations](../../database/operations.md) - Information on database integration and persistence.
|
||||
- [Monitoring Guide](../../monitoring/README.md) - Setup for system monitoring and alerting.
|
||||
- [ADR-004: Modular Data Collector System Refactoring](../../decisions/ADR-004-modular-data-collector-system.md) - Rationale and implications of the modular architecture.
|
||||
Reference in New Issue
Block a user