TCPDashboard/docs/decisions/ADR-004-modular-data-collector-system.md
Vasily.onl 0a7e444206 Refactor data collection architecture for modularity and maintainability
- Updated `pyproject.toml` to include the new `data` package in the build configuration, ensuring all components are properly included.
- Introduced `ADR-004` documentation outlining the rationale for refactoring the data collection system into a modular architecture, addressing complexity and maintainability issues.
- Enhanced `data_collectors.md` to reflect the new component structure, detailing responsibilities of `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, and `ManagerLogger`.
- Refactored `DataCollectionService` to utilize the new modular components, improving orchestration and error handling.
- Removed the obsolete `collector-service-tasks-optimization.md` and `refactor-common-package.md` files, streamlining the tasks documentation.

These changes significantly improve the architecture and maintainability of the data collection service, aligning with project standards for modularity, performance, and documentation clarity.
2025-06-10 14:32:00 +08:00

4.1 KiB

ADR-004: Modular Data Collector System Refactoring

Status

Accepted

Context

Previously, the data collection system, primarily CollectorManager and DataCollectionService, had grown in complexity, exceeding recommended file and function size limits. This led to tight coupling, scattered configuration logic, and a monolithic structure that hindered maintainability, testability, and scalability. Key issues included:

  • Large file sizes for collector_manager.py (563 lines) and collection_service.py (451 lines).
  • Functions exceeding the 50-line limit (e.g., __init__, _global_health_monitor, get_status in CollectorManager; _create_default_config, run in DataCollectionService).
  • CollectorManager handling too many responsibilities (lifecycle, health monitoring, statistics, logging).
  • Scattered configuration logic.
  • Potential memory leaks due to untracked asynchronous tasks.
  • Inefficient statistics collection.
  • Gaps in test coverage for state transitions, health monitoring, and concurrent operations.
  • Lack of comprehensive API and configuration schema documentation.

Decision

We decided to refactor the data collector system into a modular, component-based architecture. This involves:

  1. Breaking Down CollectorManager: Extracting specific responsibilities into dedicated component classes:
    • CollectorLifecycleManager: For collector lifecycle operations.
    • ManagerHealthMonitor: For global health monitoring.
    • ManagerStatsTracker: For managing and caching performance statistics.
    • ManagerLogger: For centralizing logging.
  2. Modularizing DataCollectionService: Creating specialized components for its concerns:
    • ServiceConfig: For loading, creating, and validating configurations.
    • CollectorFactory: For encapsulating collector creation logic.
  3. Introducing AsyncTaskManager: A centralized utility for managing and tracking asyncio.Task instances to prevent resource leaks and improve robustness.
  4. Enhancing Error Handling and Security: Implementing a _sanitize_error method, adding file permission validation for configurations, and ensuring specific exception handling.
  5. Optimizing Performance: Utilizing CachedStatsManager for efficient statistics updates.
  6. Improving Documentation and Testing: Adding comprehensive docstrings, creating new unit test files for components, and enhancing existing tests.

Consequences

Positive:

  • Improved Maintainability: Clear separation of concerns reduces complexity and makes code easier to understand and modify.
  • Enhanced Testability: Individual components can be unit-tested in isolation, leading to more robust and reliable code.
  • Increased Scalability: The modular design allows for easier extension and adaptation to new exchanges or data types.
  • Better Readability: Smaller files and functions improve code comprehension.
  • Robust Error Handling: Dedicated components for error handling and task management lead to more resilient operations.
  • Optimized Performance: Cached statistics and managed asynchronous tasks contribute to better resource utilization.
  • Comprehensive Documentation: Clearer architecture facilitates better documentation and onboarding.

Negative:

  • Increased File Count: More files are introduced due to the breakdown of responsibilities.
  • Initial Development Overhead: The refactoring required a significant upfront investment in time and effort.
  • Increased Indirection: More layers of abstraction might initially make the system seem more complex to new developers unfamiliar with the pattern.

Alternatives Considered

  • Minor Refactoring: Only addressing critical violations (e.g., function size) without a full modular redesign. Rejected due to not fully addressing underlying architectural issues and long-term maintainability concerns.
  • External Libraries for Each Concern: Using separate, heavy-duty external libraries for logging, health monitoring, etc. Rejected to avoid introducing unnecessary dependencies and to maintain more control over custom logic specific to our system.