TCPDashboard/docs/decisions/ADR-004-modular-data-collector-system.md

52 lines
4.1 KiB
Markdown
Raw Permalink Normal View History

# ADR-004: Modular Data Collector System Refactoring
## Status
Accepted
## Context
Previously, the data collection system, primarily `CollectorManager` and `DataCollectionService`, had grown in complexity, exceeding recommended file and function size limits. This led to tight coupling, scattered configuration logic, and a monolithic structure that hindered maintainability, testability, and scalability. Key issues included:
- Large file sizes for `collector_manager.py` (563 lines) and `collection_service.py` (451 lines).
- Functions exceeding the 50-line limit (e.g., `__init__`, `_global_health_monitor`, `get_status` in `CollectorManager`; `_create_default_config`, `run` in `DataCollectionService`).
- `CollectorManager` handling too many responsibilities (lifecycle, health monitoring, statistics, logging).
- Scattered configuration logic.
- Potential memory leaks due to untracked asynchronous tasks.
- Inefficient statistics collection.
- Gaps in test coverage for state transitions, health monitoring, and concurrent operations.
- Lack of comprehensive API and configuration schema documentation.
## Decision
We decided to refactor the data collector system into a modular, component-based architecture. This involves:
1. **Breaking Down `CollectorManager`**: Extracting specific responsibilities into dedicated component classes:
* `CollectorLifecycleManager`: For collector lifecycle operations.
* `ManagerHealthMonitor`: For global health monitoring.
* `ManagerStatsTracker`: For managing and caching performance statistics.
* `ManagerLogger`: For centralizing logging.
2. **Modularizing `DataCollectionService`**: Creating specialized components for its concerns:
* `ServiceConfig`: For loading, creating, and validating configurations.
* `CollectorFactory`: For encapsulating collector creation logic.
3. **Introducing `AsyncTaskManager`**: A centralized utility for managing and tracking `asyncio.Task` instances to prevent resource leaks and improve robustness.
4. **Enhancing Error Handling and Security**: Implementing a `_sanitize_error` method, adding file permission validation for configurations, and ensuring specific exception handling.
5. **Optimizing Performance**: Utilizing `CachedStatsManager` for efficient statistics updates.
6. **Improving Documentation and Testing**: Adding comprehensive docstrings, creating new unit test files for components, and enhancing existing tests.
## Consequences
**Positive:**
- **Improved Maintainability**: Clear separation of concerns reduces complexity and makes code easier to understand and modify.
- **Enhanced Testability**: Individual components can be unit-tested in isolation, leading to more robust and reliable code.
- **Increased Scalability**: The modular design allows for easier extension and adaptation to new exchanges or data types.
- **Better Readability**: Smaller files and functions improve code comprehension.
- **Robust Error Handling**: Dedicated components for error handling and task management lead to more resilient operations.
- **Optimized Performance**: Cached statistics and managed asynchronous tasks contribute to better resource utilization.
- **Comprehensive Documentation**: Clearer architecture facilitates better documentation and onboarding.
**Negative:**
- **Increased File Count**: More files are introduced due to the breakdown of responsibilities.
- **Initial Development Overhead**: The refactoring required a significant upfront investment in time and effort.
- **Increased Indirection**: More layers of abstraction might initially make the system seem more complex to new developers unfamiliar with the pattern.
## Alternatives Considered
- **Minor Refactoring**: Only addressing critical violations (e.g., function size) without a full modular redesign. *Rejected* due to not fully addressing underlying architectural issues and long-term maintainability concerns.
- **External Libraries for Each Concern**: Using separate, heavy-duty external libraries for logging, health monitoring, etc. *Rejected* to avoid introducing unnecessary dependencies and to maintain more control over custom logic specific to our system.