TCPDashboard/docs/decisions/ADR-004-modular-data-collector-system.md

# ADR-004: Modular Data Collector System Refactoring

## Status
Accepted

## Context
Previously, the data collection system, primarily `CollectorManager` and `DataCollectionService`, had grown in complexity, exceeding recommended file and function size limits. This led to tight coupling, scattered configuration logic, and a monolithic structure that hindered maintainability, testability, and scalability. Key issues included:

- Large file sizes for `collector_manager.py` (563 lines) and `collection_service.py` (451 lines).
- Functions exceeding the 50-line limit (e.g., `__init__`, `_global_health_monitor`, `get_status` in `CollectorManager`; `_create_default_config`, `run` in `DataCollectionService`).
- `CollectorManager` handling too many responsibilities (lifecycle, health monitoring, statistics, logging).
- Scattered configuration logic.
- Potential memory leaks due to untracked asynchronous tasks.
- Inefficient statistics collection.
- Gaps in test coverage for state transitions, health monitoring, and concurrent operations.
- Lack of comprehensive API and configuration schema documentation.

## Decision
We decided to refactor the data collector system into a modular, component-based architecture. This involves:

1.  **Breaking Down `CollectorManager`**: Extracting specific responsibilities into dedicated component classes:
    *   `CollectorLifecycleManager`: For collector lifecycle operations.
    *   `ManagerHealthMonitor`: For global health monitoring.
    *   `ManagerStatsTracker`: For managing and caching performance statistics.
    *   `ManagerLogger`: For centralizing logging.
2.  **Modularizing `DataCollectionService`**: Creating specialized components for its concerns:
    *   `ServiceConfig`: For loading, creating, and validating configurations.
    *   `CollectorFactory`: For encapsulating collector creation logic.
3.  **Introducing `AsyncTaskManager`**: A centralized utility for managing and tracking `asyncio.Task` instances to prevent resource leaks and improve robustness.
4.  **Enhancing Error Handling and Security**: Implementing a `_sanitize_error` method, adding file permission validation for configurations, and ensuring specific exception handling.
5.  **Optimizing Performance**: Utilizing `CachedStatsManager` for efficient statistics updates.
6.  **Improving Documentation and Testing**: Adding comprehensive docstrings, creating new unit test files for components, and enhancing existing tests.

## Consequences

**Positive:**
-   **Improved Maintainability**: Clear separation of concerns reduces complexity and makes code easier to understand and modify.
-   **Enhanced Testability**: Individual components can be unit-tested in isolation, leading to more robust and reliable code.
-   **Increased Scalability**: The modular design allows for easier extension and adaptation to new exchanges or data types.
-   **Better Readability**: Smaller files and functions improve code comprehension.
-   **Robust Error Handling**: Dedicated components for error handling and task management lead to more resilient operations.
-   **Optimized Performance**: Cached statistics and managed asynchronous tasks contribute to better resource utilization.
-   **Comprehensive Documentation**: Clearer architecture facilitates better documentation and onboarding.

**Negative:**
-   **Increased File Count**: More files are introduced due to the breakdown of responsibilities.
-   **Initial Development Overhead**: The refactoring required a significant upfront investment in time and effort.
-   **Increased Indirection**: More layers of abstraction might initially make the system seem more complex to new developers unfamiliar with the pattern.

## Alternatives Considered
-   **Minor Refactoring**: Only addressing critical violations (e.g., function size) without a full modular redesign. *Rejected* due to not fully addressing underlying architectural issues and long-term maintainability concerns.
-   **External Libraries for Each Concern**: Using separate, heavy-duty external libraries for logging, health monitoring, etc. *Rejected* to avoid introducing unnecessary dependencies and to maintain more control over custom logic specific to our system.
Refactor data collection architecture for modularity and maintainability - Updated `pyproject.toml` to include the new `data` package in the build configuration, ensuring all components are properly included. - Introduced `ADR-004` documentation outlining the rationale for refactoring the data collection system into a modular architecture, addressing complexity and maintainability issues. - Enhanced `data_collectors.md` to reflect the new component structure, detailing responsibilities of `CollectorLifecycleManager`, `ManagerHealthMonitor`, `ManagerStatsTracker`, and `ManagerLogger`. - Refactored `DataCollectionService` to utilize the new modular components, improving orchestration and error handling. - Removed the obsolete `collector-service-tasks-optimization.md` and `refactor-common-package.md` files, streamlining the tasks documentation. These changes significantly improve the architecture and maintainability of the data collection service, aligning with project standards for modularity, performance, and documentation clarity. 2025-06-10 14:32:00 +08:00			`# ADR-004: Modular Data Collector System Refactoring`

			`## Status`
			`Accepted`

			`## Context`
			Previously, the data collection system, primarily `CollectorManager` and `DataCollectionService`, had grown in complexity, exceeding recommended file and function size limits. This led to tight coupling, scattered configuration logic, and a monolithic structure that hindered maintainability, testability, and scalability. Key issues included:

			- Large file sizes for `collector_manager.py` (563 lines) and `collection_service.py` (451 lines).
			- Functions exceeding the 50-line limit (e.g., `__init__`, `_global_health_monitor`, `get_status` in `CollectorManager`; `_create_default_config`, `run` in `DataCollectionService`).
			- `CollectorManager` handling too many responsibilities (lifecycle, health monitoring, statistics, logging).
			`- Scattered configuration logic.`
			`- Potential memory leaks due to untracked asynchronous tasks.`
			`- Inefficient statistics collection.`
			`- Gaps in test coverage for state transitions, health monitoring, and concurrent operations.`
			`- Lack of comprehensive API and configuration schema documentation.`

			`## Decision`
			`We decided to refactor the data collector system into a modular, component-based architecture. This involves:`

			1. Breaking Down `CollectorManager`: Extracting specific responsibilities into dedicated component classes:
			* `CollectorLifecycleManager`: For collector lifecycle operations.
			* `ManagerHealthMonitor`: For global health monitoring.
			* `ManagerStatsTracker`: For managing and caching performance statistics.
			* `ManagerLogger`: For centralizing logging.
			2. Modularizing `DataCollectionService`: Creating specialized components for its concerns:
			* `ServiceConfig`: For loading, creating, and validating configurations.
			* `CollectorFactory`: For encapsulating collector creation logic.
			3. Introducing `AsyncTaskManager`: A centralized utility for managing and tracking `asyncio.Task` instances to prevent resource leaks and improve robustness.
			4. Enhancing Error Handling and Security: Implementing a `_sanitize_error` method, adding file permission validation for configurations, and ensuring specific exception handling.
			5. Optimizing Performance: Utilizing `CachedStatsManager` for efficient statistics updates.
			`6. Improving Documentation and Testing: Adding comprehensive docstrings, creating new unit test files for components, and enhancing existing tests.`

			`## Consequences`

			`Positive:`
			`- Improved Maintainability: Clear separation of concerns reduces complexity and makes code easier to understand and modify.`
			`- Enhanced Testability: Individual components can be unit-tested in isolation, leading to more robust and reliable code.`
			`- Increased Scalability: The modular design allows for easier extension and adaptation to new exchanges or data types.`
			`- Better Readability: Smaller files and functions improve code comprehension.`
			`- Robust Error Handling: Dedicated components for error handling and task management lead to more resilient operations.`
			`- Optimized Performance: Cached statistics and managed asynchronous tasks contribute to better resource utilization.`
			`- Comprehensive Documentation: Clearer architecture facilitates better documentation and onboarding.`

			`Negative:`
			`- Increased File Count: More files are introduced due to the breakdown of responsibilities.`
			`- Initial Development Overhead: The refactoring required a significant upfront investment in time and effort.`
			`- Increased Indirection: More layers of abstraction might initially make the system seem more complex to new developers unfamiliar with the pattern.`

			`## Alternatives Considered`
			`- Minor Refactoring: Only addressing critical violations (e.g., function size) without a full modular redesign. Rejected due to not fully addressing underlying architectural issues and long-term maintainability concerns.`
			`- External Libraries for Each Concern: Using separate, heavy-duty external libraries for logging, health monitoring, etc. Rejected to avoid introducing unnecessary dependencies and to maintain more control over custom logic specific to our system.`