Refactor data collection architecture for modularity and maintainability
- Updated `pyproject.toml` to include the new `data` package in the build configuration, ensuring all components are properly included. - Introduced `ADR-004` documentation outlining the rationale for refactoring the data collection system into a modular architecture, addressing complexity and maintainability issues. - Enhanced `data_collectors.md` to reflect the new component structure, detailing responsibilities of `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, and `ManagerLogger`. - Refactored `DataCollectionService` to utilize the new modular components, improving orchestration and error handling. - Removed the obsolete `collector-service-tasks-optimization.md` and `refactor-common-package.md` files, streamlining the tasks documentation. These changes significantly improve the architecture and maintainability of the data collection service, aligning with project standards for modularity, performance, and documentation clarity.
This commit is contained in:
parent
f6cb1485b1
commit
0a7e444206
52
docs/decisions/ADR-004-modular-data-collector-system.md
Normal file
52
docs/decisions/ADR-004-modular-data-collector-system.md
Normal file
@ -0,0 +1,52 @@
|
||||
# ADR-004: Modular Data Collector System Refactoring
|
||||
|
||||
## Status
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
Previously, the data collection system, primarily `CollectorManager` and `DataCollectionService`, had grown in complexity, exceeding recommended file and function size limits. This led to tight coupling, scattered configuration logic, and a monolithic structure that hindered maintainability, testability, and scalability. Key issues included:
|
||||
|
||||
- Large file sizes for `collector_manager.py` (563 lines) and `collection_service.py` (451 lines).
|
||||
- Functions exceeding the 50-line limit (e.g., `__init__`, `_global_health_monitor`, `get_status` in `CollectorManager`; `_create_default_config`, `run` in `DataCollectionService`).
|
||||
- `CollectorManager` handling too many responsibilities (lifecycle, health monitoring, statistics, logging).
|
||||
- Scattered configuration logic.
|
||||
- Potential memory leaks due to untracked asynchronous tasks.
|
||||
- Inefficient statistics collection.
|
||||
- Gaps in test coverage for state transitions, health monitoring, and concurrent operations.
|
||||
- Lack of comprehensive API and configuration schema documentation.
|
||||
|
||||
## Decision
|
||||
We decided to refactor the data collector system into a modular, component-based architecture. This involves:
|
||||
|
||||
1. **Breaking Down `CollectorManager`**: Extracting specific responsibilities into dedicated component classes:
|
||||
* `CollectorLifecycleManager`: For collector lifecycle operations.
|
||||
* `ManagerHealthMonitor`: For global health monitoring.
|
||||
* `ManagerStatsTracker`: For managing and caching performance statistics.
|
||||
* `ManagerLogger`: For centralizing logging.
|
||||
2. **Modularizing `DataCollectionService`**: Creating specialized components for its concerns:
|
||||
* `ServiceConfig`: For loading, creating, and validating configurations.
|
||||
* `CollectorFactory`: For encapsulating collector creation logic.
|
||||
3. **Introducing `AsyncTaskManager`**: A centralized utility for managing and tracking `asyncio.Task` instances to prevent resource leaks and improve robustness.
|
||||
4. **Enhancing Error Handling and Security**: Implementing a `_sanitize_error` method, adding file permission validation for configurations, and ensuring specific exception handling.
|
||||
5. **Optimizing Performance**: Utilizing `CachedStatsManager` for efficient statistics updates.
|
||||
6. **Improving Documentation and Testing**: Adding comprehensive docstrings, creating new unit test files for components, and enhancing existing tests.
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- **Improved Maintainability**: Clear separation of concerns reduces complexity and makes code easier to understand and modify.
|
||||
- **Enhanced Testability**: Individual components can be unit-tested in isolation, leading to more robust and reliable code.
|
||||
- **Increased Scalability**: The modular design allows for easier extension and adaptation to new exchanges or data types.
|
||||
- **Better Readability**: Smaller files and functions improve code comprehension.
|
||||
- **Robust Error Handling**: Dedicated components for error handling and task management lead to more resilient operations.
|
||||
- **Optimized Performance**: Cached statistics and managed asynchronous tasks contribute to better resource utilization.
|
||||
- **Comprehensive Documentation**: Clearer architecture facilitates better documentation and onboarding.
|
||||
|
||||
**Negative:**
|
||||
- **Increased File Count**: More files are introduced due to the breakdown of responsibilities.
|
||||
- **Initial Development Overhead**: The refactoring required a significant upfront investment in time and effort.
|
||||
- **Increased Indirection**: More layers of abstraction might initially make the system seem more complex to new developers unfamiliar with the pattern.
|
||||
|
||||
## Alternatives Considered
|
||||
- **Minor Refactoring**: Only addressing critical violations (e.g., function size) without a full modular redesign. *Rejected* due to not fully addressing underlying architectural issues and long-term maintainability concerns.
|
||||
- **External Libraries for Each Concern**: Using separate, heavy-duty external libraries for logging, health monitoring, etc. *Rejected* to avoid introducing unnecessary dependencies and to maintain more control over custom logic specific to our system.
|
||||
@ -100,17 +100,49 @@ Manages and dispatches real-time data to registered callbacks.
|
||||
- Notifies all subscribed listeners when new data points are received.
|
||||
- Ensures efficient and reliable distribution of processed market data.
|
||||
|
||||
### 5. `CollectorManager`
|
||||
### 5. `CollectorLifecycleManager`
|
||||
|
||||
A singleton class that manages all active data collectors in the system.
|
||||
Manages the lifecycle of individual data collectors, including adding, removing, starting, stopping, and restarting them.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Centralized `start` and `stop` for all collectors
|
||||
- System-wide status aggregation
|
||||
- Global health monitoring
|
||||
- Coordination of restart policies
|
||||
- Handles `add_collector`, `remove_collector`, `enable_collector`, `disable_collector`.
|
||||
- Manages `_start_collector`, `restart_collector`, `restart_all_collectors` operations.
|
||||
|
||||
### 6. Exchange-Specific Collectors
|
||||
### 6. `ManagerHealthMonitor`
|
||||
|
||||
Encapsulates the logic for global system health monitoring.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Implements the `_global_health_monitor` logic.
|
||||
- Provides system-wide health checks and auto-restart coordination.
|
||||
|
||||
### 7. `ManagerStatsTracker`
|
||||
|
||||
Manages the collection and retrieval of performance statistics for the `CollectorManager`.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Updates and provides statistics via `get_status`.
|
||||
- Utilizes `CachedStatsManager` for optimized, periodic updates of statistics.
|
||||
|
||||
### 8. `ManagerLogger`
|
||||
|
||||
Centralizes all logging operations for the `CollectorManager`.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Provides wrapper methods for logging at different levels (`_log_debug`, `_log_info`, `_log_warning`, `_log_error`, `_log_critical`).
|
||||
- Ensures consistent log formatting and includes `exc_info=True` for error logs.
|
||||
|
||||
### 9. `CollectorManager`
|
||||
|
||||
A singleton class that orchestrates all active data collectors and their associated components. It now delegates specific responsibilities to dedicated component classes.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Centralized control and coordination of `CollectorLifecycleManager`.
|
||||
- Aggregation of system-wide status from `ManagerHealthMonitor` and `ManagerStatsTracker`.
|
||||
- Unified logging through `ManagerLogger`.
|
||||
- Overall system-wide status aggregation.
|
||||
|
||||
### 10. Exchange-Specific Collectors
|
||||
|
||||
Concrete implementations of `BaseDataCollector` for each exchange (e.g., `OKXCollector`).
|
||||
|
||||
@ -122,6 +154,29 @@ Concrete implementations of `BaseDataCollector` for each exchange (e.g., `OKXCol
|
||||
|
||||
For more details, see [OKX Collector Documentation (`./exchanges/okx.md`)](./exchanges/okx.md).
|
||||
|
||||
### 11. `ServiceConfig`
|
||||
|
||||
Handles the loading, creation, and validation of service configurations.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Manages `_load_config` and `_create_default_config` logic.
|
||||
- Implements schema validation for configuration files and file permission validation.
|
||||
|
||||
### 12. `CollectorFactory`
|
||||
|
||||
Encapsulates the logic for creating individual data collector instances.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Manages the `_create_collector` logic, decoupling collector creation from the `DataCollectionService`.
|
||||
|
||||
### 13. `AsyncTaskManager`
|
||||
|
||||
Provides a comprehensive utility for managing and tracking asynchronous tasks.
|
||||
|
||||
**Key Responsibilities:**
|
||||
- Manages `asyncio.Task` instances, preventing potential memory leaks and ensuring proper task lifecycle.
|
||||
- Used by `CollectorManager` and `DataCollectionService` for robust asynchronous operations.
|
||||
|
||||
## Exchange Factory
|
||||
|
||||
The `ExchangeFactory` provides a standardized way to create data collectors, decoupling the client code from specific implementations.
|
||||
@ -162,7 +217,7 @@ collectors = ExchangeFactory.create_multiple_collectors(configs)
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
The system includes a robust, two-level health monitoring system.
|
||||
The system includes a robust, two-level health monitoring system, now enhanced with cached statistics management.
|
||||
|
||||
### 1. Collector-Level Monitoring
|
||||
|
||||
@ -176,13 +231,13 @@ Each `BaseDataCollector` instance has its own health monitoring.
|
||||
|
||||
### 2. Manager-Level Monitoring
|
||||
|
||||
The `CollectorManager` provides a global view of system health.
|
||||
The `CollectorManager` provides a global view of system health, leveraging `ManagerHealthMonitor` and `ManagerStatsTracker`.
|
||||
|
||||
**Key Metrics:**
|
||||
- **Aggregate Status**: Overview of all collectors (running, stopped, failed)
|
||||
- **System Uptime**: Total uptime for the collector system
|
||||
- **Failed Collectors**: List of collectors that failed to restart
|
||||
- **Resource Usage**: (Future) System-level CPU and memory monitoring
|
||||
- **Resource Usage**: System-level CPU and memory monitoring
|
||||
|
||||
### Health Status API
|
||||
|
||||
@ -208,16 +263,61 @@ For detailed status schemas, refer to the [Reference Documentation (`../../refer
|
||||
|
||||
### `CollectorManager`
|
||||
- `add_collector(collector)`
|
||||
- `remove_collector(collector_id)`
|
||||
- `enable_collector(collector_id)`
|
||||
- `disable_collector(collector_id)`
|
||||
- `restart_collector(collector_id)`
|
||||
- `async start_all()`
|
||||
- `async stop_all()`
|
||||
- `get_status() -> dict`
|
||||
- `list_collectors() -> list`
|
||||
|
||||
### `DataCollectionService`
|
||||
- `async run()`
|
||||
- `async stop()`
|
||||
|
||||
### `ExchangeFactory`
|
||||
- `create_collector(config) -> BaseDataCollector`
|
||||
- `create_multiple_collectors(configs) -> list`
|
||||
- `get_supported_exchanges() -> list`
|
||||
|
||||
### `CollectorLifecycleManager`
|
||||
- `add_collector(collector)`
|
||||
- `remove_collector(collector_id)`
|
||||
- `enable_collector(collector_id)`
|
||||
- `disable_collector(collector_id)`
|
||||
- `_start_collector(collector)`
|
||||
- `restart_collector(collector_id)`
|
||||
- `restart_all_collectors()`
|
||||
|
||||
### `ManagerHealthMonitor`
|
||||
- `_global_health_monitor()`
|
||||
|
||||
### `ManagerStatsTracker`
|
||||
- `get_status() -> dict`
|
||||
- `_update_stats()`
|
||||
|
||||
### `ManagerLogger`
|
||||
- `_log_debug(message, exc_info=False)`
|
||||
- `_log_info(message, exc_info=False)`
|
||||
- `_log_warning(message, exc_info=False)`
|
||||
- `_log_error(message, exc_info=True)`
|
||||
- `_log_critical(message, exc_info=True)`
|
||||
|
||||
### `ServiceConfig`
|
||||
- `_load_config(config_path)`
|
||||
- `_create_default_config()`
|
||||
- `validate_permissions(file_path)`
|
||||
|
||||
### `CollectorFactory`
|
||||
- `_create_collector(config)`
|
||||
|
||||
### `AsyncTaskManager`
|
||||
- `add_task(task)`
|
||||
- `remove_task(task_id)`
|
||||
- `cancel_all_tasks()`
|
||||
- `wait_for_all_tasks()`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
@ -234,12 +334,21 @@ For detailed status schemas, refer to the [Reference Documentation (`../../refer
|
||||
- **Cause**: Trying to create a collector for an exchange not registered in the factory.
|
||||
- **Solution**: Implement the collector and register it in `data/exchanges/__init__.py`.
|
||||
|
||||
4. **Error information leakage in logs/responses**
|
||||
- **Cause**: Raw exception details being exposed.
|
||||
- **Solution**: Ensure error messages are sanitized using `_sanitize_error` before logging or returning to external calls.
|
||||
|
||||
5. **Configuration file permission issues**
|
||||
- **Cause**: Improper file permissions preventing the service from reading configuration.
|
||||
- **Solution**: Verify file permissions for configuration files. The `ServiceConfig` now includes validation for this.
|
||||
|
||||
### Best Practices
|
||||
|
||||
- Use the `CollectorManager` for lifecycle management.
|
||||
- Always validate configurations before creating collectors.
|
||||
- Use the `CollectorManager` for lifecycle management, delegating to its components.
|
||||
- Always validate configurations before creating collectors, leveraging `ServiceConfig`.
|
||||
- Monitor system status regularly using `manager.get_status()`.
|
||||
- Refer to logs for detailed error analysis.
|
||||
- Refer to logs for detailed error analysis, paying attention to `exc_info=True` for critical errors.
|
||||
- Ensure `AsyncTaskManager` is used for all long-running asynchronous operations to prevent resource leaks.
|
||||
|
||||
---
|
||||
*Back to [Modules Documentation (`../README.md`)]*
|
||||
@ -4,23 +4,61 @@
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
The data collection service uses a **manager-worker architecture** to collect data for multiple trading pairs concurrently.
|
||||
The data collection service has been refactored into a **modular, component-based architecture** to collect data for multiple trading pairs concurrently with improved maintainability, scalability, and testability.
|
||||
|
||||
- **`CollectorManager`**: The central manager responsible for creating, starting, stopping, and monitoring individual data collectors.
|
||||
- **`OKXCollector`**: A dedicated worker responsible for collecting data for a single trading pair from the OKX exchange.
|
||||
- **`DataCollectionService`**: The primary orchestration layer, responsible for initializing and coordinating core service components. It delegates specific functionalities to dedicated managers and factories.
|
||||
- **`CollectorManager`**: Now acts as an orchestrator for individual data collectors, utilizing its own set of internal components (e.g., `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, `ManagerLogger`).
|
||||
- **Dedicated Components**: Specific concerns like configuration, collector creation, and asynchronous task management are handled by new, specialized classes (`ServiceConfig`, `CollectorFactory`, `AsyncTaskManager`).
|
||||
- **`OKXCollector`**: A dedicated worker responsible for collecting data for a single trading pair from the OKX exchange, now built upon a more robust `BaseDataCollector` and its internal components (`ConnectionManager`, `CollectorStateAndTelemetry`, `CallbackDispatcher`).
|
||||
|
||||
This architecture allows for high scalability and fault tolerance.
|
||||
This modular architecture allows for high scalability, fault tolerance, and clear separation of concerns.
|
||||
|
||||
## Key Components
|
||||
|
||||
### `DataCollectionService`
|
||||
|
||||
- **Location**: `data/collection_service.py`
|
||||
- **Responsibilities**:
|
||||
- Orchestrates the overall data collection process.
|
||||
- Initializes and coordinates `ServiceConfig`, `CollectorFactory`, `CollectorManager`, and `AsyncTaskManager`.
|
||||
- Manages the main service loop and graceful shutdown.
|
||||
- Provides a high-level API for running and monitoring the service.
|
||||
|
||||
### `ServiceConfig`
|
||||
|
||||
- **Location**: `config/service_config.py`
|
||||
- **Responsibilities**:
|
||||
- Handles loading, creating, and validating service configurations.
|
||||
- Ensures configuration file integrity, including file permission validation.
|
||||
- Manages default configuration generation and runtime updates.
|
||||
|
||||
### `CollectorFactory`
|
||||
|
||||
- **Location**: `data/collector_factory.py`
|
||||
- **Responsibilities**:
|
||||
- Encapsulates the logic for creating individual data collector instances (e.g., `OKXCollector`).
|
||||
- Decouples collector instantiation from the `DataCollectionService`.
|
||||
- Ensures collectors are created with correct configurations and dependencies.
|
||||
|
||||
### `AsyncTaskManager`
|
||||
|
||||
- **Location**: `utils/async_task_manager.py`
|
||||
- **Responsibilities**:
|
||||
- Manages and tracks `asyncio.Task` instances throughout the application.
|
||||
- Prevents potential memory leaks by ensuring proper task lifecycle management.
|
||||
- Facilitates robust asynchronous operations for both `DataCollectionService` and `CollectorManager`.
|
||||
|
||||
### `CollectorManager`
|
||||
|
||||
- **Location**: `tasks/collector_manager.py`
|
||||
- **Location**: `data/collector_manager.py`
|
||||
- **Responsibilities**:
|
||||
- Manages the lifecycle of multiple collectors
|
||||
- Provides a unified API for controlling all collectors
|
||||
- Monitors the health of each collector
|
||||
- Distributes tasks and aggregates results
|
||||
- Acts as an orchestrator for all active data collectors.
|
||||
- Delegates specific responsibilities to its new internal components:
|
||||
- `CollectorLifecycleManager`: Manages adding, removing, starting, and stopping collectors.
|
||||
- `ManagerHealthMonitor`: Encapsulates global health monitoring and auto-restart logic.
|
||||
- `ManagerStatsTracker`: Handles performance statistics collection and caching.
|
||||
- `ManagerLogger`: Centralizes logging operations for the manager and its collectors.
|
||||
- Provides a unified interface for controlling and monitoring managed collectors.
|
||||
|
||||
### `OKXCollector`
|
||||
|
||||
@ -67,15 +105,20 @@ The service is configured through `config/bot_configs/data_collector_config.json
|
||||
|
||||
## Usage
|
||||
|
||||
Start the service from the main application entry point:
|
||||
The `DataCollectionService` is the main entry point for running the data collection system.
|
||||
|
||||
Start the service from a script (e.g., `scripts/start_data_collection.py`):
|
||||
|
||||
```python
|
||||
# main.py
|
||||
from tasks.collector_manager import CollectorManager
|
||||
# scripts/start_data_collection.py
|
||||
import asyncio
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging # Assuming this exists or is created
|
||||
|
||||
async def main():
|
||||
manager = CollectorManager()
|
||||
await manager.start_all_collectors()
|
||||
setup_logging() # Initialize logging
|
||||
service = DataCollectionService(config_path="config/data_collection.json")
|
||||
await service.run() # Or run with a duration: await service.run(duration_hours=24)
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@ -83,7 +126,7 @@ if __name__ == "__main__":
|
||||
|
||||
## Health & Monitoring
|
||||
|
||||
The `CollectorManager` provides a `get_status()` method to monitor the health of all collectors.
|
||||
The `DataCollectionService` and `CollectorManager` provide comprehensive health and monitoring capabilities through their dedicated components.
|
||||
|
||||
## Features
|
||||
|
||||
@ -196,42 +239,78 @@ The service uses JSON configuration files with automatic default creation if non
|
||||
|
||||
### Service Layer Components
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────┐
|
||||
│ DataCollectionService │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ Configuration Manager │ │
|
||||
│ │ • JSON config loading/validation │ │
|
||||
│ │ • Default config generation │ │
|
||||
│ │ • Runtime config updates │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ Service Monitor │ │
|
||||
│ │ • Service-level health checks │ │
|
||||
│ │ • Uptime tracking │ │
|
||||
│ │ • Error aggregation │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────────────────────────────────┐ │
|
||||
│ │ CollectorManager │ │
|
||||
│ │ • Individual collector management │ │
|
||||
│ │ • Health monitoring │ │
|
||||
│ │ • Auto-restart coordination │ │
|
||||
│ └─────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────┐
|
||||
│ Core Data Collectors │
|
||||
│ (See data_collectors.md) │
|
||||
└─────────────────────────────┘
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph DataCollectionService
|
||||
SC[ServiceConfig] -- Manages --> Conf(Configuration)
|
||||
SCF[CollectorFactory] -- Creates --> Collectors(Data Collectors)
|
||||
ATM[AsyncTaskManager] -- Manages --> Tasks(Async Tasks)
|
||||
DCS[DataCollectionService] -- Uses --> SC
|
||||
DCS -- Uses --> SCF
|
||||
DCS -- Uses --> ATM
|
||||
DCS -- Orchestrates --> CM(CollectorManager)
|
||||
end
|
||||
|
||||
subgraph CollectorManager
|
||||
CM --> CLM(CollectorLifecycleManager)
|
||||
CM --> MHM(ManagerHealthMonitor)
|
||||
CM --> MST(ManagerStatsTracker)
|
||||
CM --> ML(ManagerLogger)
|
||||
CLM -- Manages --> BC[BaseDataCollector]
|
||||
MHM -- Monitors --> BC
|
||||
MST -- Tracks --> BC
|
||||
ML -- Logs For --> BC
|
||||
end
|
||||
|
||||
subgraph BaseDataCollector (Core Data Collector)
|
||||
BC --> ConM(ConnectionManager)
|
||||
BC --> CST(CollectorStateAndTelemetry)
|
||||
BC --> CD(CallbackDispatcher)
|
||||
end
|
||||
|
||||
Conf -- Provides --> DCS
|
||||
Collectors -- Created By --> SCF
|
||||
Tasks -- Managed By --> ATM
|
||||
CM -- Manages --> BaseDataCollector
|
||||
BaseDataCollector -- Collects Data --> Database
|
||||
BaseDataCollector -- Publishes Data --> Redis(Redis Pub/Sub)
|
||||
|
||||
style DCS fill:#f9f,stroke:#333,stroke-width:2px
|
||||
style CM fill:#bbf,stroke:#333,stroke-width:2px
|
||||
style BC fill:#cfc,stroke:#333,stroke-width:2px
|
||||
style SC fill:#FFD700,stroke:#333,stroke-width:1px
|
||||
style SCF fill:#90EE90,stroke:#333,stroke-width:1px
|
||||
style ATM fill:#ADD8E6,stroke:#333,stroke-width:1px
|
||||
style CLM fill:#FFC0CB,stroke:#333,stroke-width:1px
|
||||
style MHM fill:#C0C0C0,stroke:#333,stroke-width:1px
|
||||
style MST fill:#DA70D6,stroke:#333,stroke-width:1px
|
||||
style ML fill:#DDA0DD,stroke:#333,stroke-width:1px
|
||||
style ConM fill:#F0F8FF,stroke:#333,stroke-width:1px
|
||||
style CST fill:#FFE4E1,stroke:#333,stroke-width:1px
|
||||
style CD fill:#FAFAD2,stroke:#333,stroke-width:1px
|
||||
style DB fill:#A9A9A9,stroke:#333,stroke-width:1px
|
||||
style Redis fill:#FF6347,stroke:#333,stroke-width:1px
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Configuration → Service → CollectorManager → Data Collectors → Database
|
||||
↓ ↓
|
||||
Service Monitor Health Monitor
|
||||
```mermaid
|
||||
graph LR
|
||||
Config(Configuration) --> ServiceConfig
|
||||
ServiceConfig --> DataCollectionService
|
||||
DataCollectionService -- Initializes --> CollectorManager
|
||||
DataCollectionService -- Initializes --> CollectorFactory
|
||||
DataCollectionService -- Initializes --> AsyncTaskManager
|
||||
CollectorFactory -- Creates --> BaseDataCollector
|
||||
CollectorManager -- Manages --> BaseDataCollector
|
||||
BaseDataCollector -- Collects Data --> Database
|
||||
BaseDataCollector -- Publishes Data --> RedisPubSub(Redis Pub/Sub)
|
||||
HealthMonitor(Health Monitoring) --> DataCollectionService
|
||||
HealthMonitor --> CollectorManager
|
||||
HealthMonitor --> BaseDataCollector
|
||||
ErrorHandling(Error Handling) --> DataCollectionService
|
||||
ErrorHandling --> CollectorManager
|
||||
ErrorHandling --> BaseDataCollector
|
||||
```
|
||||
|
||||
### Storage Integration
|
||||
@ -283,49 +362,69 @@ The service implements **clean production logging** focused on operational needs
|
||||
|
||||
### DataCollectionService
|
||||
|
||||
The main service class for managing data collection operations.
|
||||
The main service class for managing data collection operations, now orchestrating through specialized components.
|
||||
|
||||
#### Constructor
|
||||
|
||||
```python
|
||||
DataCollectionService(config_path: str = "config/data_collection.json")
|
||||
DataCollectionService(
|
||||
config_path: str = "config/data_collection.json",
|
||||
service_config: Optional[ServiceConfig] = None,
|
||||
collector_factory: Optional[CollectorFactory] = None,
|
||||
collector_manager: Optional[CollectorManager] = None,
|
||||
async_task_manager: Optional[AsyncTaskManager] = None
|
||||
)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `config_path`: Path to JSON configuration file
|
||||
- `config_path`: Path to JSON configuration file. Used if `service_config` is not provided.
|
||||
- `service_config`: An instance of `ServiceConfig`. If None, one will be created.
|
||||
- `collector_factory`: An instance of `CollectorFactory`. If None, one will be created.
|
||||
- `collector_manager`: An instance of `CollectorManager`. If None, one will be created.
|
||||
- `async_task_manager`: An instance of `AsyncTaskManager`. If None, one will be created.
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `async run(duration_hours: Optional[float] = None) -> bool`
|
||||
##### `async run(duration_hours: Optional[float] = None) -> None`
|
||||
|
||||
Run the service for a specified duration or indefinitely.
|
||||
Runs the service for a specified duration or indefinitely. This method now coordinates the main event loop and lifecycle of all internal components.
|
||||
|
||||
**Parameters:**
|
||||
- `duration_hours`: Optional duration in hours (None = indefinite)
|
||||
- `duration_hours`: Optional duration in hours (None = indefinite).
|
||||
|
||||
**Returns:**
|
||||
- `bool`: True if successful, False if error occurred
|
||||
- `None`
|
||||
|
||||
**Example:**
|
||||
```python
|
||||
service = DataCollectionService()
|
||||
await service.run(duration_hours=24) # Run for 24 hours
|
||||
from data.collection_service import DataCollectionService
|
||||
import asyncio
|
||||
|
||||
async def run_service():
|
||||
service = DataCollectionService()
|
||||
await service.run(duration_hours=24) # Run for 24 hours
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(run_service())
|
||||
```
|
||||
|
||||
##### `async start() -> bool`
|
||||
##### `async start() -> None`
|
||||
|
||||
Start the data collection service and all configured collectors.
|
||||
Initializes and starts the data collection service and all configured collectors. This method delegates to internal components for their respective startup procedures.
|
||||
|
||||
**Returns:**
|
||||
- `bool`: True if started successfully
|
||||
- `None`
|
||||
|
||||
##### `async stop() -> None`
|
||||
|
||||
Stop the service gracefully, including all collectors and cleanup.
|
||||
Stops the service gracefully, including all collectors and internal cleanup. Ensures all asynchronous tasks are properly cancelled and resources released.
|
||||
|
||||
**Returns:**
|
||||
- `None`
|
||||
|
||||
##### `get_status() -> Dict[str, Any]`
|
||||
|
||||
Get current service status including uptime, collector counts, and errors.
|
||||
Gets current service status, including uptime, collector counts, and errors, aggregated from underlying components.
|
||||
|
||||
**Returns:**
|
||||
```python
|
||||
@ -341,23 +440,23 @@ Get current service status including uptime, collector counts, and errors.
|
||||
'config_file': 'config/data_collection.json',
|
||||
'exchanges_enabled': ['okx'],
|
||||
'total_trading_pairs': 6
|
||||
},
|
||||
'detailed_collector_statuses': { # New field for detailed statuses
|
||||
'okx_BTC-USDT': {'status': 'RUNNING', 'health_score': 95},
|
||||
'okx_ETH-USDT': {'status': 'ERROR', 'last_error': 'Connection refused'}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
##### `async initialize_collectors() -> bool`
|
||||
##### `_run_main_loop(duration_hours: Optional[float])`
|
||||
|
||||
Initialize all collectors based on configuration.
|
||||
Internal method extracted from `run()` to manage the core asynchronous loop.
|
||||
|
||||
**Parameters:**
|
||||
- `duration_hours`: Optional duration in hours for the loop.
|
||||
|
||||
**Returns:**
|
||||
- `bool`: True if all collectors initialized successfully
|
||||
|
||||
##### `load_configuration() -> Dict[str, Any]`
|
||||
|
||||
Load and validate configuration from file.
|
||||
|
||||
**Returns:**
|
||||
- `dict`: Loaded configuration
|
||||
- `None`
|
||||
|
||||
### Standalone Function
|
||||
|
||||
@ -367,17 +466,17 @@ Load and validate configuration from file.
|
||||
async def run_data_collection_service(
|
||||
config_path: str = "config/data_collection.json",
|
||||
duration_hours: Optional[float] = None
|
||||
) -> bool
|
||||
) -> None
|
||||
```
|
||||
|
||||
Convenience function to run the service with minimal setup.
|
||||
Convenience function to run the service with minimal setup, internally creating a `DataCollectionService` instance.
|
||||
|
||||
**Parameters:**
|
||||
- `config_path`: Path to configuration file
|
||||
- `duration_hours`: Optional duration in hours
|
||||
- `config_path`: Path to configuration file.
|
||||
- `duration_hours`: Optional duration in hours.
|
||||
|
||||
**Returns:**
|
||||
- `bool`: True if successful
|
||||
- `None`
|
||||
|
||||
## Integration Examples
|
||||
|
||||
@ -386,15 +485,16 @@ Convenience function to run the service with minimal setup.
|
||||
```python
|
||||
import asyncio
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging # Assuming this exists or is created
|
||||
|
||||
async def main():
|
||||
setup_logging()
|
||||
service = DataCollectionService("config/my_config.json")
|
||||
|
||||
|
||||
# Run for 24 hours
|
||||
success = await service.run(duration_hours=24)
|
||||
|
||||
if not success:
|
||||
print("Service encountered errors")
|
||||
await service.run(duration_hours=24)
|
||||
|
||||
print("Service run finished.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@ -405,23 +505,34 @@ if __name__ == "__main__":
|
||||
```python
|
||||
import asyncio
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging
|
||||
|
||||
async def monitor_service():
|
||||
setup_logging()
|
||||
service = DataCollectionService()
|
||||
|
||||
|
||||
# Start service in background
|
||||
start_task = asyncio.create_task(service.run())
|
||||
|
||||
# Monitor status every 5 minutes
|
||||
while service.running:
|
||||
status = service.get_status()
|
||||
print(f"Service Uptime: {status['uptime_hours']:.1f}h")
|
||||
print(f"Collectors: {status['collectors_running']}/{status['collectors_total']}")
|
||||
print(f"Errors: {status['errors_count']}")
|
||||
|
||||
await asyncio.sleep(300) # 5 minutes
|
||||
|
||||
await start_task
|
||||
|
||||
# Monitor status every 60 seconds
|
||||
try:
|
||||
while True:
|
||||
status = service.get_status()
|
||||
print(f"Service Uptime: {status['uptime_hours']:.1f}h")
|
||||
print(f"Collectors: {status['collectors_running']}/{status['collectors_total']}")
|
||||
print(f"Errors: {status['errors_count']}")
|
||||
if status['errors_count'] > 0:
|
||||
print(f"Last error: {status['last_error']}")
|
||||
print("Detailed Collector Statuses:")
|
||||
for name, details in status.get('detailed_collector_statuses', {}).items():
|
||||
print(f" - {name}: Status={details.get('status')}, Health Score={details.get('health_score')}")
|
||||
|
||||
await asyncio.sleep(60)
|
||||
except asyncio.CancelledError:
|
||||
print("Monitoring cancelled.")
|
||||
finally:
|
||||
await service.stop()
|
||||
await start_task # Ensure the main service task is awaited
|
||||
|
||||
asyncio.run(monitor_service())
|
||||
```
|
||||
@ -431,32 +542,40 @@ asyncio.run(monitor_service())
|
||||
```python
|
||||
import asyncio
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging
|
||||
|
||||
async def controlled_collection():
|
||||
setup_logging()
|
||||
service = DataCollectionService()
|
||||
|
||||
|
||||
try:
|
||||
# Initialize and start
|
||||
await service.initialize_collectors()
|
||||
# Start the service
|
||||
await service.start()
|
||||
|
||||
print("Data collection service started.")
|
||||
|
||||
# Monitor and control
|
||||
while True:
|
||||
status = service.get_status()
|
||||
|
||||
# Check if any collectors failed
|
||||
if status['collectors_failed'] > 0:
|
||||
print("Some collectors failed, checking health...")
|
||||
# Service auto-restart will handle this
|
||||
|
||||
await asyncio.sleep(60) # Check every minute
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("Shutting down service...")
|
||||
finally:
|
||||
await service.stop()
|
||||
print(f"Current Service Status: {status['service_running']}, Collectors Running: {status['collectors_running']}")
|
||||
|
||||
asyncio.run(controlled_collection())
|
||||
# Example: Stop if certain condition met (e.g., specific error, or after a duration)
|
||||
if status['collectors_failed'] > 0:
|
||||
print("Some collectors failed, service is recovering...")
|
||||
# The service's internal health monitor and task manager will handle restarts
|
||||
# For demonstration, stop after 5 minutes
|
||||
await asyncio.sleep(300)
|
||||
print("Stopping service after 5 minutes of operation.")
|
||||
break
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("Manual shutdown requested.")
|
||||
finally:
|
||||
print("Shutting down service gracefully...")
|
||||
await service.stop()
|
||||
print("Service stopped.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(controlled_collection())
|
||||
```
|
||||
|
||||
### Configuration Management
|
||||
@ -465,91 +584,113 @@ asyncio.run(controlled_collection())
|
||||
import asyncio
|
||||
import json
|
||||
from data.collection_service import DataCollectionService
|
||||
from utils.logger import setup_logging
|
||||
from config.service_config import ServiceConfig # Import the new ServiceConfig
|
||||
|
||||
async def dynamic_configuration():
|
||||
service = DataCollectionService()
|
||||
|
||||
setup_logging()
|
||||
# Instantiate ServiceConfig directly or let DataCollectionService create it
|
||||
service_config_instance = ServiceConfig(config_path="config/data_collection.json")
|
||||
service = DataCollectionService(service_config=service_config_instance)
|
||||
|
||||
print("Initial configuration loaded:")
|
||||
print(json.dumps(service_config_instance.get_config(), indent=2))
|
||||
|
||||
# Load and modify configuration
|
||||
config = service.load_configuration()
|
||||
|
||||
# Add new trading pair
|
||||
config['exchanges']['okx']['trading_pairs'].append({
|
||||
config = service_config_instance.get_config()
|
||||
|
||||
# Add new trading pair if not already present
|
||||
new_pair = {
|
||||
'symbol': 'SOL-USDT',
|
||||
'enabled': True,
|
||||
'data_types': ['trade'],
|
||||
'timeframes': ['1m', '5m']
|
||||
})
|
||||
|
||||
}
|
||||
if new_pair not in config['exchanges']['okx']['trading_pairs']:
|
||||
config['exchanges']['okx']['trading_pairs'].append(new_pair)
|
||||
print("Added SOL-USDT to configuration.")
|
||||
else:
|
||||
print("SOL-USDT already in configuration.")
|
||||
|
||||
# Save updated configuration
|
||||
with open('config/data_collection.json', 'w') as f:
|
||||
json.dump(config, f, indent=2)
|
||||
|
||||
# Restart service with new config
|
||||
service_config_instance.save_config(config) # Use ServiceConfig to save
|
||||
|
||||
print("Updated configuration saved. Restarting service with new config...")
|
||||
await service.stop()
|
||||
await service.start()
|
||||
print("Service restarted with updated configuration.")
|
||||
|
||||
asyncio.run(dynamic_configuration())
|
||||
# Verify new pair is active (logic would be in get_status or similar)
|
||||
status = service.get_status()
|
||||
print(f"Current active collectors count: {status['collectors_total']}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(dynamic_configuration())
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The service implements robust error handling at the service orchestration level:
|
||||
The service implements robust error handling at multiple layers, leveraging the new component structure for more precise error management and recovery.
|
||||
|
||||
### Service Level Errors
|
||||
|
||||
- **Configuration Errors**: Invalid JSON, missing required fields
|
||||
- **Initialization Errors**: Failed collector creation, database connectivity
|
||||
- **Runtime Errors**: Service-level exceptions, resource exhaustion
|
||||
- **Configuration Errors**: Invalid JSON, missing required fields, file permission issues (handled by `ServiceConfig`).
|
||||
- **Initialization Errors**: Failed collector creation (handled by `CollectorFactory`), database connectivity.
|
||||
- **Runtime Errors**: Service-level exceptions, resource exhaustion, unhandled exceptions in asynchronous tasks (managed by `AsyncTaskManager`).
|
||||
|
||||
### Error Recovery Strategies
|
||||
|
||||
1. **Graceful Degradation**: Continue with healthy collectors
|
||||
2. **Configuration Validation**: Validate before applying changes
|
||||
3. **Service Restart**: Full service restart on critical errors
|
||||
4. **Error Aggregation**: Collect and report errors across all collectors
|
||||
1. **Graceful Degradation**: Continue with healthy collectors while attempting to recover failed ones.
|
||||
2. **Configuration Validation**: `ServiceConfig` validates configurations before application, preventing common startup issues.
|
||||
3. **Automated Restarts**: `ManagerHealthMonitor` and `AsyncTaskManager` coordinate automatic restarts for failed collectors/tasks.
|
||||
4. **Error Aggregation**: `ManagerStatsTracker` collects and reports errors across all collectors, providing a unified view.
|
||||
5. **Sanitized Error Messages**: `ManagerLogger` ensures sensitive internal details are not leaked in logs or public interfaces.
|
||||
|
||||
### Error Reporting
|
||||
|
||||
```python
|
||||
# Service status includes error information
|
||||
# Service status includes aggregated error information
|
||||
status = service.get_status()
|
||||
|
||||
if status['errors_count'] > 0:
|
||||
print(f"Service has {status['errors_count']} errors")
|
||||
print(f"Last error: {status['last_error']}")
|
||||
|
||||
# Get detailed error information from collectors
|
||||
for collector_name in service.manager.list_collectors():
|
||||
collector_status = service.manager.get_collector_status(collector_name)
|
||||
if collector_status['status'] == 'error':
|
||||
print(f"Collector {collector_name}: {collector_status['statistics']['last_error']}")
|
||||
print(f"Service has {status['errors_count']} errors.")
|
||||
print(f"Last service error: {status['last_error']}")
|
||||
|
||||
# Get detailed error information from individual collectors if available
|
||||
if 'detailed_collector_statuses' in status:
|
||||
for collector_name, details in status['detailed_collector_statuses'].items():
|
||||
if details.get('status') == 'ERROR' and 'last_error' in details:
|
||||
print(f"Collector {collector_name} error: {details['last_error']}")
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
The testing approach now emphasizes unit tests for individual components and integration tests for component interactions, ensuring thorough coverage of the modular architecture.
|
||||
|
||||
### Running Service Tests
|
||||
|
||||
```bash
|
||||
# Run all data collection service tests
|
||||
uv run pytest tests/test_data_collection_service.py -v
|
||||
uv run pytest tests/data/collection_service -v # Assuming tests are in a 'collection_service' subdir
|
||||
|
||||
# Run specific test categories
|
||||
uv run pytest tests/test_data_collection_service.py::TestDataCollectionService -v
|
||||
# Run specific component tests, e.g., for ServiceConfig
|
||||
uv run pytest tests/config/test_service_config.py -v
|
||||
|
||||
# Run with coverage
|
||||
uv run pytest tests/test_data_collection_service.py --cov=data.collection_service
|
||||
# Run with coverage for the entire data collection module
|
||||
uv run pytest --cov=data --cov=config --cov=utils tests/
|
||||
```
|
||||
|
||||
### Test Coverage
|
||||
|
||||
The service test suite covers:
|
||||
- Service initialization and configuration loading
|
||||
- Collector orchestration and management
|
||||
- Service lifecycle (start/stop/restart)
|
||||
- Configuration validation and error handling
|
||||
- Signal handling and graceful shutdown
|
||||
- Status reporting and monitoring
|
||||
- Error aggregation and recovery
|
||||
The expanded test suite now covers:
|
||||
- **Component Unit Tests**: Individual tests for `ServiceConfig`, `CollectorFactory`, `AsyncTaskManager`, `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, `ManagerLogger`.
|
||||
- **Service Integration Tests**: Testing `DataCollectionService`'s orchestration of its components.
|
||||
- Service initialization and configuration loading/validation.
|
||||
- Collector orchestration and management via `CollectorManager` and `CollectorLifecycleManager`.
|
||||
- Asynchronous task management and error recovery.
|
||||
- Service lifecycle (start/stop/restart) and signal handling.
|
||||
- Status reporting and monitoring, including detailed collector statuses.
|
||||
- Error aggregation and recovery strategies.
|
||||
|
||||
### Mock Testing
|
||||
|
||||
@ -557,18 +698,39 @@ The service test suite covers:
|
||||
import pytest
|
||||
from unittest.mock import AsyncMock, patch
|
||||
from data.collection_service import DataCollectionService
|
||||
from config.service_config import ServiceConfig # Ensure new components are imported for mocking
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_service_with_mock_collectors():
|
||||
with patch('data.collection_service.CollectorManager') as mock_manager:
|
||||
# Mock successful initialization
|
||||
mock_manager.return_value.start.return_value = True
|
||||
|
||||
service = DataCollectionService()
|
||||
result = await service.start()
|
||||
|
||||
assert result is True
|
||||
mock_manager.return_value.start.assert_called_once()
|
||||
async def test_service_with_mock_components():
|
||||
with patch('data.collection_service.ServiceConfig') as MockServiceConfig, \
|
||||
patch('data.collection_service.CollectorFactory') as MockCollectorFactory, \
|
||||
patch('data.collection_service.CollectorManager') as MockCollectorManager, \
|
||||
patch('data.collection_service.AsyncTaskManager') as MockAsyncTaskManager:
|
||||
|
||||
# Configure mocks for successful operation
|
||||
MockServiceConfig.return_value.load_config.return_value = {"collectors": []}
|
||||
MockServiceConfig.return_value.get_config.return_value = {"collectors": []}
|
||||
MockCollectorManager.return_value.start_all.return_value = None
|
||||
MockCollectorManager.return_value.stop_all.return_value = None
|
||||
MockAsyncTaskManager.return_value.start.return_value = None
|
||||
MockAsyncTaskManager.return_value.stop.return_value = None
|
||||
|
||||
service = DataCollectionService(
|
||||
service_config=MockServiceConfig.return_value,
|
||||
collector_factory=MockCollectorFactory.return_value,
|
||||
collector_manager=MockCollectorManager.return_value,
|
||||
async_task_manager=MockAsyncTaskManager.return_value
|
||||
)
|
||||
await service.start()
|
||||
|
||||
# Assertions to ensure components were called correctly
|
||||
MockServiceConfig.return_value.load_config.assert_called_once()
|
||||
MockCollectorManager.return_value.start_all.assert_called_once()
|
||||
MockAsyncTaskManager.return_value.start.assert_called_once()
|
||||
|
||||
await service.stop()
|
||||
MockCollectorManager.return_value.stop_all.assert_called_once()
|
||||
MockAsyncTaskManager.return_value.stop.assert_called_once()
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
@ -855,7 +1017,8 @@ for collector_name in service.manager.list_collectors():
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Data Collectors System](../components/data_collectors.md) - Core collector components
|
||||
- [Logging System](../components/logging.md) - Logging configuration
|
||||
- [Database Operations](../database/operations.md) - Database integration
|
||||
- [Monitoring Guide](../monitoring/README.md) - System monitoring setup
|
||||
- [Data Collectors System](../data_collectors.md) - Comprehensive documentation on core collector components and their modular internal structure.
|
||||
- [Logging System](../logging.md) - Details on logging configuration and philosophy.
|
||||
- [Database Operations](../../database/operations.md) - Information on database integration and persistence.
|
||||
- [Monitoring Guide](../../monitoring/README.md) - Setup for system monitoring and alerting.
|
||||
- [ADR-004: Modular Data Collector System Refactoring](../../decisions/ADR-004-modular-data-collector-system.md) - Rationale and implications of the modular architecture.
|
||||
@ -60,7 +60,7 @@ requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[tool.hatch.build.targets.wheel]
|
||||
packages = ["config", "database", "scripts", "tests"]
|
||||
packages = ["config", "database", "scripts", "tests", "data"]
|
||||
|
||||
[tool.black]
|
||||
line-length = 88
|
||||
|
||||
@ -1,131 +0,0 @@
|
||||
## Relevant Files
|
||||
|
||||
- `data/collector_manager.py` - Core manager for data collectors (refactored: 563→178 lines, enhanced with TaskManager).
|
||||
- `data/collection_service.py` - Main service for data collection (enhanced with TaskManager).
|
||||
- `data/collector_types.py` - Shared data types for collector management (new file).
|
||||
- `data/manager_components/` - Component classes for modular manager architecture (new directory).
|
||||
- `data/manager_components/manager_stats_tracker.py` - Enhanced with performance monitoring and cache optimization.
|
||||
- `utils/async_task_manager.py` - New comprehensive async task management utility (new file).
|
||||
- `data/__init__.py` - Updated imports for new structure.
|
||||
- `tests/test_collector_manager.py` - Unit tests for `collector_manager.py` (imports updated).
|
||||
- `tests/test_data_collection_aggregation.py` - Integration tests (imports updated).
|
||||
- `scripts/production_clean.py` - Production script (verified working).
|
||||
- `scripts/start_data_collection.py` - Data collection script (verified working).
|
||||
|
||||
## Code Review Analysis: `collection_service.py` & `collector_manager.py`
|
||||
|
||||
### Overall Assessment
|
||||
Both files show good foundational architecture but exceed the recommended file size limits and contain several areas for improvement.
|
||||
|
||||
### 📏 File Size Violations
|
||||
- **`collector_manager.py`**: 563 lines (❌ Exceeds 250-line limit by 125%)
|
||||
- **`collection_service.py`**: 451 lines (❌ Exceeds 250-line limit by 80%)
|
||||
|
||||
### 🔍 Function Size Analysis
|
||||
**Functions Exceeding 50-Line Limit:**
|
||||
**`collector_manager.py`:**
|
||||
- `__init__()` - 65 lines
|
||||
- `_global_health_monitor()` - 71 lines
|
||||
- `get_status()` - 53 lines
|
||||
|
||||
**`collection_service.py`:**
|
||||
- `_create_default_config()` - 89 lines
|
||||
- `run()` - 98 lines
|
||||
|
||||
### 🏗️ Architecture & Design Issues
|
||||
1. **Tight Coupling in CollectorManager**
|
||||
- **Issue**: The manager class handles too many responsibilities (collector lifecycle, health monitoring, statistics, logging).
|
||||
- **Solution**: Apply Single Responsibility Principle by creating dedicated component classes.
|
||||
2. **Configuration Management Complexity**
|
||||
- **Issue**: Configuration logic scattered across multiple methods.
|
||||
- **Solution**: Dedicated configuration manager for centralized handling.
|
||||
|
||||
### 🔒 Security & Error Handling Review
|
||||
**Strengths:**
|
||||
- Proper exception handling with context
|
||||
- No hardcoded credentials
|
||||
- Graceful shutdown handling
|
||||
- Input validation in configuration
|
||||
|
||||
**Areas for Improvement:**
|
||||
1. **Error Information Leakage**
|
||||
- **Issue**: Could leak internal details.
|
||||
- **Solution**: Sanitize error messages before logging.
|
||||
2. **Configuration File Security**
|
||||
- **Issue**: No file permission validation.
|
||||
- **Solution**: Add validation to ensure appropriate file permissions.
|
||||
|
||||
### 🚀 Performance Optimization Opportunities
|
||||
1. **Async Task Management**
|
||||
- **Issue**: Potential memory leaks with untracked tasks.
|
||||
- **Solution**: Implement proper task lifecycle management with a `TaskManager`.
|
||||
2. **Statistics Collection Optimization**
|
||||
- **Issue**: Statistics calculated on every status request.
|
||||
- **Solution**: Use cached statistics with background updates via a `CachedStatsManager`.
|
||||
|
||||
### 🧪 Testing & Maintainability
|
||||
**Missing Test Coverage Areas:**
|
||||
1. Collector manager state transitions
|
||||
2. Health monitoring edge cases
|
||||
3. Configuration validation
|
||||
4. Signal handling
|
||||
5. Concurrent collector operations
|
||||
|
||||
### 📝 Documentation Improvements
|
||||
1. **Missing API Documentation**
|
||||
- **Issue**: Public methods and classes lack comprehensive docstrings.
|
||||
- **Solution**: Add examples, thread safety, and performance considerations.
|
||||
2. **Configuration Schema Documentation**
|
||||
- **Issue**: No formal schema validation.
|
||||
- **Solution**: Implement JSON schema validation for configurations.
|
||||
|
||||
### 📊 Quality Metrics Summary
|
||||
| Metric | Current | Target | Status |
|
||||
|--------|---------|--------|--------|
|
||||
| File Size | 563/451 lines | <250 lines | ❌ |
|
||||
| Function Size | 5 functions >50 lines | 0 functions >50 lines | ❌ |
|
||||
| Cyclomatic Complexity | Medium-High | Low-Medium | ⚠️ |
|
||||
| Test Coverage | ~30% estimated | >80% | ❌ |
|
||||
| Documentation | Basic | Comprehensive | ⚠️ |
|
||||
| Error Handling | Good | Excellent | ✅ |
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1.0 Refactor `collector_manager.py` for Modularity and Readability
|
||||
- [x] 1.1 Extract `ManagerStatus` and `CollectorConfig` dataclasses to `data/collector_types.py`.
|
||||
- [x] 1.2 Create `data/manager_components/collector_lifecycle_manager.py` to handle `add_collector`, `remove_collector`, `enable_collector`, `disable_collector`, `_start_collector`, `restart_collector`, `restart_all_collectors`.
|
||||
- [x] 1.3 Create `data/manager_components/manager_health_monitor.py` to encapsulate `_global_health_monitor` logic.
|
||||
- [x] 1.4 Create `data/manager_components/manager_stats_tracker.py` to manage statistics in `get_status` and update `_stats`.
|
||||
- [x] 1.5 Create `data/manager_components/manager_logger.py` to centralize logging methods (`_log_debug`, `_log_info`, `_log_warning`, `_log_error`, `_log_critical`).
|
||||
- [x] 1.6 Update `CollectorManager` to use instances of these new component classes.
|
||||
- [x] 1.7 Ensure `CollectorManager` `__init__` method is under 50 lines by delegating initialization to helper methods within the class or component classes.
|
||||
|
||||
- [x] 2.0 Refactor `collection_service.py` for Improved Structure
|
||||
- [x] 2.1 Create `config/service_config.py` to handle `_load_config` and `_create_default_config` logic, including schema validation.
|
||||
- [x] 2.2 Create `data/collector_factory.py` to encapsulate `_create_collector` logic.
|
||||
- [x] 2.3 Update `DataCollectionService` to use instances of these new component classes.
|
||||
- [x] 2.4 Refactor `run()` method to be under 50 lines by extracting sub-logics (e.g., `_run_main_loop`).
|
||||
- [x] 2.5 Test './scripts/start_data_collection.py' and './scripts/production_clean.py' to ensure they work as expected.
|
||||
|
||||
- [x] 3.0 Enhance Error Handling and Security
|
||||
- [x] 3.1 Implement a `_sanitize_error` method in `CollectorManager` and `DataCollectionService` to prevent leaking internal error details.
|
||||
- [x] 3.2 Add file permission validation for configuration files in `config/service_config.py`.
|
||||
- [x] 3.3 Review all `try-except` blocks to ensure specific exceptions are caught rather than broad `Exception`.
|
||||
- [x] 3.4 Ensure all logger calls include `exc_info=True` for error and critical logs.
|
||||
- [x] 3.5 Test './scripts/start_data_collection.py' and './scripts/production_clean.py' to ensure they work as expected.
|
||||
|
||||
|
||||
- [x] 4.0 Optimize Performance and Resource Management
|
||||
- [x] 4.1 Implement a `TaskManager` class in `utils/async_task_manager.py` to manage and track `asyncio.Task` instances in `CollectorManager` and `DataCollectionService`.
|
||||
- [x] 4.2 Introduce a `CachedStatsManager` in `data/manager_components/manager_stats_tracker.py` for `CollectorManager` to cache statistics and update them periodically instead of on every `get_status` call.
|
||||
- [x] 4.3 Review all `asyncio.sleep` calls for optimal intervals.
|
||||
- [x] 4.4 Test './scripts/start_data_collection.py' and './scripts/production_clean.py' to ensure they work as expected.
|
||||
|
||||
- [ ] 5.0 Improve Documentation and Test Coverage
|
||||
- [ ] 5.1 Add comprehensive docstrings to all public methods and classes in `CollectorManager` and `DataCollectionService`, including examples, thread safety notes, and performance considerations.
|
||||
- [ ] 5.2 Create new unit test files: `tests/data/manager_components/test_collector_lifecycle_manager.py`, `tests/data/manager_components/test_manager_health_monitor.py`, `tests/data/manager_components/test_manager_stats_tracker.py`, `tests/config/test_service_config.py`, `tests/data/test_collector_factory.py`.
|
||||
- [ ] 5.3 Write unit tests for all new components (lifecycle manager, health monitor, stats tracker, service config, collector factory).
|
||||
- [ ] 5.4 Enhance existing tests or create new ones for `CollectorManager` to cover state transitions, health monitoring edge cases, and concurrent operations.
|
||||
- [ ] 5.5 Enhance existing tests or create new ones for `DataCollectionService` to cover configuration validation, service lifecycle, and signal handling.
|
||||
- [ ] 5.6 Ensure all tests use `uv run pytest` and verify passing.
|
||||
- [ ] 5.7 Test './scripts/start_data_collection.py' and './scripts/production_clean.py' to ensure they work as expected.
|
||||
@ -1,66 +0,0 @@
|
||||
## Relevant Files
|
||||
|
||||
- `data/common/aggregation.py` - To be broken into a sub-package.
|
||||
- `data/common/indicators.py` - To be broken into a sub-package and have a bug fixed.
|
||||
- `data/common/validation.py` - To be refactored for better modularity.
|
||||
- `data/common/transformation.py` - ✅ Refactored into transformation package with safety limits.
|
||||
- `data/common/data_types.py` - To be updated with new types from other modules.
|
||||
- `data/common/__init__.py` - ✅ Updated to reflect the new package structure.
|
||||
- `tests/` - Existing tests will need to be run after each step to ensure no regressions.
|
||||
|
||||
### Notes
|
||||
|
||||
- This refactoring focuses on improving modularity by splitting large files into smaller, more focused modules, as outlined in the `refactoring.mdc` guide.
|
||||
- Each major step will be followed by a verification phase to ensure the application remains stable.
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 1.0 Refactor `aggregation.py` into a dedicated sub-package.
|
||||
- [x] 1.1 Create safety net tests to ensure the aggregation logic still works as expected.
|
||||
- [x] 1.2 Create a new directory `data/common/aggregation`.
|
||||
- [x] 1.3 Create `data/common/aggregation/__init__.py` to mark it as a package.
|
||||
- [x] 1.4 Move the `TimeframeBucket` class to `data/common/aggregation/bucket.py`.
|
||||
- [x] 1.5 Move the `RealTimeCandleProcessor` class to `data/common/aggregation/realtime.py`.
|
||||
- [x] 1.6 Move the `BatchCandleProcessor` class to `data/common/aggregation/batch.py`.
|
||||
- [x] 1.7 Move the utility functions to `data/common/aggregation/utils.py`.
|
||||
- [x] 1.8 Update `data/common/aggregation/__init__.py` to expose all public classes and functions.
|
||||
- [x] 1.9 Delete the original `data/common/aggregation.py` file.
|
||||
- [x] 1.10 Run tests to verify the aggregation logic still works as expected.
|
||||
|
||||
- [x] 2.0 Refactor `indicators.py` into a dedicated sub-package.
|
||||
- [x] 2.1 Create safety net tests for indicators module.
|
||||
- [x] 2.2 Create a new directory `data/common/indicators`.
|
||||
- [x] 2.3 Create `data/common/indicators/__init__.py` to mark it as a package.
|
||||
- [x] 2.4 Move the `TechnicalIndicators` class to `data/common/indicators/technical.py`.
|
||||
- [x] 2.5 Move the `IndicatorResult` class to `data/common/indicators/result.py`.
|
||||
- [x] 2.6 Move the utility functions to `data/common/indicators/utils.py`.
|
||||
- [x] 2.7 Update `data/common/indicators/__init__.py` to expose all public classes and functions.
|
||||
- [x] 2.8 Delete the original `data/common/indicators.py` file.
|
||||
- [x] 2.9 Run tests to verify the indicators logic still works as expected.
|
||||
|
||||
- [x] 3.0 Refactor `validation.py` for better modularity.
|
||||
- [x] 3.1 Create safety net tests for validation module.
|
||||
- [x] 3.2 Extract common validation logic into separate functions.
|
||||
- [x] 3.3 Improve error handling and validation messages.
|
||||
- [x] 3.4 Run tests to verify validation still works as expected.
|
||||
|
||||
- [x] 4.0 Refactor `transformation.py` for better modularity.
|
||||
- [x] 4.1 Create safety net tests for transformation module.
|
||||
- [x] 4.2 Extract common transformation logic into separate functions.
|
||||
- [x] 4.3 Improve error handling and transformation messages.
|
||||
- [x] 4.4 Run tests to verify transformation still works as expected.
|
||||
- [x] 4.5 Create comprehensive safety limits system.
|
||||
- [x] 4.6 Add documentation for the transformation module.
|
||||
- [x] 4.7 Delete redundant transformation.py file.
|
||||
|
||||
- [x] 5.0 Update `data_types.py` with new types.
|
||||
- [x] 5.1 Review and document all data types.
|
||||
- [x] 5.2 Add any missing type hints.
|
||||
- [x] 5.3 Add validation for data types.
|
||||
- [x] 5.4 Run tests to verify data types still work as expected.
|
||||
|
||||
- [ ] 6.0 Final verification and cleanup.
|
||||
- [x] 6.1 Run all tests to ensure no regressions.
|
||||
- [x] 6.2 Update documentation to reflect new structure.
|
||||
- [x] 6.3 Review and clean up any remaining TODOs.
|
||||
- [ ] 6.4 Create PR with changes.
|
||||
@ -1,66 +0,0 @@
|
||||
## Relevant Files
|
||||
|
||||
- `data/base_collector.py` - The main file to be refactored, where `BaseDataCollector` is defined.
|
||||
- `data/collector/collector_state_telemetry.py` - New file for managing collector status, health, and statistics.
|
||||
- `data/collector/collector_connection_manager.py` - New file for handling connection, disconnection, and reconnection logic.
|
||||
- `data/collector/collector_callback_dispatcher.py` - New file for managing data callbacks and notifications.
|
||||
- `data/ohlcv_data.py` - Potential new file for `OHLCVData` and related validation if deemed beneficial.
|
||||
- `tests/data/test_base_collector.py` - Existing test file for `BaseDataCollector`.
|
||||
- `tests/data/collector/test_collector_state_telemetry.py` - New test file for `CollectorStateAndTelemetry` class.
|
||||
- `tests/data/collector/test_collector_connection_manager.py` - New test file for `ConnectionManager` class.
|
||||
- `tests/data/collector/test_collector_callback_dispatcher.py` - New test file for `CallbackDispatcher` class.
|
||||
- `tests/data/test_ohlcv_data.py` - New test file for `OHLCVData` and validation.
|
||||
|
||||
### Notes
|
||||
|
||||
- Unit tests should typically be placed alongside the code files they are testing (e.g., `MyComponent.tsx` and `MyComponent.test.tsx` in the same directory).
|
||||
- Each refactoring step will be small and verified with existing tests, and new tests will be created for extracted components.
|
||||
|
||||
## Tasks
|
||||
|
||||
- [x] 0.0 Create `data/collector` directory
|
||||
- [x] 1.0 Extract `CollectorStateAndTelemetry` Class
|
||||
- [x] 1.1 Create `data/collector/collector_state_telemetry.py`.
|
||||
- [x] 1.2 Move `CollectorStatus` enum to `data/collector/collector_state_telemetry.py`.
|
||||
- [x] 1.3 Move `_stats` initialization and related helper methods (`_log_debug`, `_log_info`, `_log_warning`, `_log_error`, `_log_critical`) to `CollectorStateAndTelemetry`.
|
||||
- [x] 1.4 Move `get_status` and `get_health_status` methods to `CollectorStateAndTelemetry`.
|
||||
- [x] 1.5 Implement a constructor for `CollectorStateAndTelemetry` to receive logger and initial parameters.
|
||||
- [x] 1.6 Add necessary imports to both `data/base_collector.py` and `data/collector/collector_state_telemetry.py`.
|
||||
- [x] 1.7 Create `tests/data/collector/test_collector_state_telemetry.py` and add initial tests for the new class.
|
||||
|
||||
- [x] 2.0 Extract `ConnectionManager` Class
|
||||
- [x] 2.1 Create `data/collector/collector_connection_manager.py`.
|
||||
- [x] 2.2 Move connection-related attributes (`_connection`, `_reconnect_attempts`, `_max_reconnect_attempts`, `_reconnect_delay`) to `ConnectionManager`.
|
||||
- [x] 2.3 Move `connect`, `disconnect`, `_handle_connection_error` methods to `ConnectionManager`.
|
||||
- [x] 2.4 Implement a constructor for `ConnectionManager` to receive logger and other necessary parameters.
|
||||
- [x] 2.5 Add necessary imports to both `data/base_collector.py` and `data/collector/collector_connection_manager.py`.
|
||||
- [x] 2.6 Create `tests/data/collector/test_collector_connection_manager.py` and add initial tests for the new class.
|
||||
|
||||
- [x] 3.0 Extract `CallbackDispatcher` Class
|
||||
- [x] 3.1 Create `data/collector/collector_callback_dispatcher.py`.
|
||||
- [x] 3.2 Move `_data_callbacks` attribute to `CallbackDispatcher`.
|
||||
- [x] 3.3 Move `add_data_callback`, `remove_data_callback`, `_notify_callbacks` methods to `CallbackDispatcher`.
|
||||
- [x] 3.4 Implement a constructor for `CallbackDispatcher` to receive logger.
|
||||
- [x] 3.5 Add necessary imports to both `data/base_collector.py` and `data/collector/collector_callback_dispatcher.py`.
|
||||
- [x] 3.6 Create `tests/data/collector/test_collector_callback_dispatcher.py` and add initial tests for the new class.
|
||||
|
||||
- [x] 4.0 Refactor `BaseDataCollector` to use new components
|
||||
- [x] 4.1 Update `BaseDataCollector.__init__` to instantiate and use `CollectorStateAndTelemetry`, `ConnectionManager`, and `CallbackDispatcher` instances.
|
||||
- [x] 4.2 Replace direct access to moved attributes/methods with calls to the new component instances (e.g., `self.logger.info` becomes `self._state_telemetry.log_info`).
|
||||
- [x] 4.3 Modify `start`, `stop`, `restart`, `_message_loop`, `_health_monitor` to interact with the new components, delegating responsibilities appropriately.
|
||||
- [x] 4.4 Update `get_status` and `get_health_status` in `BaseDataCollector` to delegate to `CollectorStateAndTelemetry`.
|
||||
- [x] 4.5 Review and update abstract methods and their calls as needed, ensuring they interact correctly with the new components.
|
||||
- [x] 4.6 Ensure all existing tests for `BaseDataCollector` still pass after refactoring.
|
||||
- [x] 4.7 Update `data/exchanges/okx/collector.py` to use the new `CollectorStateAndTelemetry` and `ConnectionManager` classes for logging, status updates, and connection handling.
|
||||
- [x] 4.8 Update `data/collector_manager.py` to interact with the new `CollectorStateAndTelemetry` class for health checks and status retrieval from `BaseDataCollector` instances.
|
||||
|
||||
- [x] 5.0 Review and potentially extract `OHLCVData` and related validation
|
||||
- [x] 5.1 Analyze if `OHLCVData` and `validate_ohlcv_data` are frequently used outside of `data/base_collector.py`.
|
||||
- [x] 5.2 If analysis indicates external usage or clear separation benefits, move `OHLCVData` class and `DataValidationError` to a new `data/ohlcv_data.py` file.
|
||||
- [x] 5.3 Update imports in `data/base_collector.py` and any other affected files.
|
||||
- [x] 5.4 If `OHLCVData` is extracted, create `tests/data/test_ohlcv_data.py` with tests for its structure and validation logic.
|
||||
|
||||
- [x] 6.0 Update Module Imports
|
||||
- [x] 6.1 Update imports in `data/__init__.py` to reflect the new locations of `CollectorStatus`, `DataCollectorError`, `DataValidationError`, `DataType`, `MarketDataPoint`, and `OHLCVData` (if moved).
|
||||
- [x] 6.2 Update imports in `data/common/data_types.py` for `DataType` and `MarketDataPoint`.
|
||||
- [x] 6.3 Review and update imports in all test files (`tests/test_refactored_okx.py`, `tests/test_real_storage.py`, `tests/test_okx_collector.py`, `tests/test_exchange_factory.py`, `tests/test_data_collection_aggregation.py`, `tests/test_collector_manager.py`, `tests/test_base_collector.py`, `tests/database/test_database_operations.py`) and scripts (`scripts/production_clean.py`) that import directly from `data.base_collector`.
|
||||
Loading…
x
Reference in New Issue
Block a user