- Updated `pyproject.toml` to include the new `data` package in the build configuration, ensuring all components are properly included. - Introduced `ADR-004` documentation outlining the rationale for refactoring the data collection system into a modular architecture, addressing complexity and maintainability issues. - Enhanced `data_collectors.md` to reflect the new component structure, detailing responsibilities of `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, and `ManagerLogger`. - Refactored `DataCollectionService` to utilize the new modular components, improving orchestration and error handling. - Removed the obsolete `collector-service-tasks-optimization.md` and `refactor-common-package.md` files, streamlining the tasks documentation. These changes significantly improve the architecture and maintainability of the data collection service, aligning with project standards for modularity, performance, and documentation clarity.
4.1 KiB
4.1 KiB
ADR-004: Modular Data Collector System Refactoring
Status
Accepted
Context
Previously, the data collection system, primarily CollectorManager and DataCollectionService, had grown in complexity, exceeding recommended file and function size limits. This led to tight coupling, scattered configuration logic, and a monolithic structure that hindered maintainability, testability, and scalability. Key issues included:
- Large file sizes for
collector_manager.py(563 lines) andcollection_service.py(451 lines). - Functions exceeding the 50-line limit (e.g.,
__init__,_global_health_monitor,get_statusinCollectorManager;_create_default_config,runinDataCollectionService). CollectorManagerhandling too many responsibilities (lifecycle, health monitoring, statistics, logging).- Scattered configuration logic.
- Potential memory leaks due to untracked asynchronous tasks.
- Inefficient statistics collection.
- Gaps in test coverage for state transitions, health monitoring, and concurrent operations.
- Lack of comprehensive API and configuration schema documentation.
Decision
We decided to refactor the data collector system into a modular, component-based architecture. This involves:
- Breaking Down
CollectorManager: Extracting specific responsibilities into dedicated component classes:CollectorLifecycleManager: For collector lifecycle operations.ManagerHealthMonitor: For global health monitoring.ManagerStatsTracker: For managing and caching performance statistics.ManagerLogger: For centralizing logging.
- Modularizing
DataCollectionService: Creating specialized components for its concerns:ServiceConfig: For loading, creating, and validating configurations.CollectorFactory: For encapsulating collector creation logic.
- Introducing
AsyncTaskManager: A centralized utility for managing and trackingasyncio.Taskinstances to prevent resource leaks and improve robustness. - Enhancing Error Handling and Security: Implementing a
_sanitize_errormethod, adding file permission validation for configurations, and ensuring specific exception handling. - Optimizing Performance: Utilizing
CachedStatsManagerfor efficient statistics updates. - Improving Documentation and Testing: Adding comprehensive docstrings, creating new unit test files for components, and enhancing existing tests.
Consequences
Positive:
- Improved Maintainability: Clear separation of concerns reduces complexity and makes code easier to understand and modify.
- Enhanced Testability: Individual components can be unit-tested in isolation, leading to more robust and reliable code.
- Increased Scalability: The modular design allows for easier extension and adaptation to new exchanges or data types.
- Better Readability: Smaller files and functions improve code comprehension.
- Robust Error Handling: Dedicated components for error handling and task management lead to more resilient operations.
- Optimized Performance: Cached statistics and managed asynchronous tasks contribute to better resource utilization.
- Comprehensive Documentation: Clearer architecture facilitates better documentation and onboarding.
Negative:
- Increased File Count: More files are introduced due to the breakdown of responsibilities.
- Initial Development Overhead: The refactoring required a significant upfront investment in time and effort.
- Increased Indirection: More layers of abstraction might initially make the system seem more complex to new developers unfamiliar with the pattern.
Alternatives Considered
- Minor Refactoring: Only addressing critical violations (e.g., function size) without a full modular redesign. Rejected due to not fully addressing underlying architectural issues and long-term maintainability concerns.
- External Libraries for Each Concern: Using separate, heavy-duty external libraries for logging, health monitoring, etc. Rejected to avoid introducing unnecessary dependencies and to maintain more control over custom logic specific to our system.