Refactor data module to enhance modularity and maintainability

- Extracted `OHLCVData` and validation logic into a new `common/ohlcv_data.py` module, promoting better organization and reusability.
- Updated `BaseDataCollector` to utilize the new `validate_ohlcv_data` function for improved data validation, enhancing code clarity and maintainability.
- Refactored imports in `data/__init__.py` to reflect the new structure, ensuring consistent access to common data types and exceptions.
- Removed redundant data validation logic from `BaseDataCollector`, streamlining its responsibilities.
- Added unit tests for `OHLCVData` and validation functions to ensure correctness and reliability.

These changes improve the architecture of the data module, aligning with project standards for maintainability and performance.
This commit is contained in:
Vasily.onl
2025-06-10 12:04:58 +08:00
parent 3db8fb1c41
commit 33f2110f19
15 changed files with 511 additions and 1009 deletions

View File

@@ -62,15 +62,45 @@ For exchange-specific documentation, see [Exchange Implementations (`./exchanges
### 1. `BaseDataCollector`
An abstract base class that defines the common interface for all exchange collectors.
An abstract base class that defines the common interface for all exchange collectors. It now orchestrates specialized components for connection management, state and telemetry, and callback dispatching.
**Key Responsibilities:**
- Standardized `start`, `stop`, `restart` methods
- Built-in health monitoring with heartbeat and data silence detection
- Automatic reconnect and restart logic
- Asynchronous message handling
- Standardized `start`, `stop`, `restart` methods.
- Orchestrates connection handling via `ConnectionManager`.
- Delegates state, health, and statistics management to `CollectorStateAndTelemetry`.
- Utilizes `CallbackDispatcher` for managing and notifying data subscribers.
- Defines abstract methods for exchange-specific implementations (e.g., `_actual_connect`, `_actual_disconnect`, `_subscribe_channels`, `_process_message`).
### 2. `CollectorManager`
### 2. `CollectorStateAndTelemetry`
Manages the operational state, health, and performance statistics of a data collector.
**Key Responsibilities:**
- Tracks `CollectorStatus` (e.g., `RUNNING`, `STOPPED`, `ERROR`).
- Monitors health metrics like heartbeat and data silence.
- Collects and provides operational statistics (e.g., messages processed, errors).
- Provides centralized logging functionality for the collector.
### 3. `ConnectionManager`
Handles the WebSocket connection lifecycle and resilience for a data collector.
**Key Responsibilities:**
- Establishes and terminates WebSocket connections.
- Manages automatic reconnection attempts with exponential backoff.
- Handles connection-related errors and ensures robust connectivity.
- Tracks WebSocket connection state and statistics.
### 4. `CallbackDispatcher`
Manages and dispatches real-time data to registered callbacks.
**Key Responsibilities:**
- Registers and unregisters data callbacks for different `DataType`s.
- Notifies all subscribed listeners when new data points are received.
- Ensures efficient and reliable distribution of processed market data.
### 5. `CollectorManager`
A singleton class that manages all active data collectors in the system.
@@ -80,7 +110,7 @@ A singleton class that manages all active data collectors in the system.
- Global health monitoring
- Coordination of restart policies
### 3. Exchange-Specific Collectors
### 6. Exchange-Specific Collectors
Concrete implementations of `BaseDataCollector` for each exchange (e.g., `OKXCollector`).

View File

@@ -6,8 +6,8 @@
1. **Base Collector**
- Inherit from `BaseDataCollector`
- Implement required abstract methods
- Handle connection lifecycle
- Implement exchange-specific abstract methods (e.g., `_actual_connect`, `_actual_disconnect`, `_subscribe_channels`, `_process_message`)
- Leverage `ConnectionManager`, `CollectorStateAndTelemetry`, and `CallbackDispatcher` through the inherited `BaseDataCollector` functionalities
2. **WebSocket Client**
- Implement exchange-specific WebSocket handling

View File

@@ -897,13 +897,13 @@ The OKX collector consists of three main components working together:
### `OKXCollector`
- **Main class**: `OKXCollector(BaseDataCollector)`
- **Responsibilities**:
- Manages WebSocket connection state
- Subscribes to required data channels
- Dispatches raw messages to the data processor
- Stores standardized data in the database
- Provides health and status monitoring
- **Main class**: `OKXCollector(BaseDataCollector)`
- **Responsibilities**:
- Implements exchange-specific connection and subscription logic (delegating to `ConnectionManager` for core connection handling).
- Processes and standardizes raw OKX WebSocket messages (delegating to `OKXDataProcessor`).
- Interacts with `CollectorStateAndTelemetry` for status, health, and logging.
- Uses `CallbackDispatcher` to notify subscribers of processed data.
- Stores standardized data in the database.
### `OKXWebSocketClient`
@@ -915,12 +915,12 @@ The OKX collector consists of three main components working together:
### `OKXDataProcessor`
- **New in v2.0**: `OKXDataProcessor`
- **Responsibilities**:
- Validates incoming raw data from WebSocket
- Transforms data into standardized `StandardizedTrade` and `OHLCVCandle` formats
- Aggregates trades into OHLCV candles
- Invokes callbacks for processed trades and completed candles
- **New in v2.0**: `OKXDataProcessor`
- **Responsibilities**:
- Validates incoming raw data from WebSocket.
- Transforms data into standardized `MarketDataPoint` and `OHLCVData` formats (using the moved `OHLCVData`).
- Aggregates trades into OHLCV candles.
- Invokes callbacks for processed trades and completed candles.
## Configuration
@@ -932,12 +932,12 @@ Configuration options for the `OKXCollector` class:
|-------------------------|---------------------|---------------------------------------|-----------------------------------------------------------------------------|
| `symbol` | `str` | - | Trading symbol (e.g., `BTC-USDT`) |
| `data_types` | `List[DataType]` | `[TRADE, ORDERBOOK]` | List of data types to collect |
| `auto_restart` | `bool` | `True` | Automatically restart on failures |
| `health_check_interval` | `float` | `30.0` | Seconds between health checks |
| `auto_restart` | `bool` | `True` | Automatically restart on failures (managed by `BaseDataCollector` via `ConnectionManager`) |
| `health_check_interval` | `float` | `30.0` | Seconds between health checks (managed by `BaseDataCollector` via `CollectorStateAndTelemetry`) |
| `store_raw_data` | `bool` | `True` | Store raw WebSocket data for debugging |
| `force_update_candles` | `bool` | `False` | If `True`, update existing candles; if `False`, keep existing ones unchanged |
| `logger` | `Logger` | `None` | Logger instance for conditional logging |
| `log_errors_only` | `bool` | `False` | If `True` and logger provided, only log error-level messages |
| `logger` | `Logger` | `None` | Logger instance for conditional logging (managed by `BaseDataCollector` via `CollectorStateAndTelemetry`) |
| `log_errors_only` | `bool` | `False` | If `True` and logger provided, only log error-level messages (managed by `BaseDataCollector` via `CollectorStateAndTelemetry`) |
### Health & Status Monitoring
@@ -962,4 +962,4 @@ Example output:
}
```
## Database Integration
## Database Integration

View File

@@ -26,10 +26,13 @@ This architecture allows for high scalability and fault tolerance.
- **Location**: `data/exchanges/okx/collector.py`
- **Responsibilities**:
- Connects to the OKX WebSocket API
- Subscribes to real-time data channels
- Processes and standardizes incoming data
- Stores data in the database
- Inherits from `BaseDataCollector` and implements exchange-specific data collection logic.
- Utilizes `ConnectionManager` for robust WebSocket connection management.
- Leverages `CollectorStateAndTelemetry` for internal status, health, and logging.
- Uses `CallbackDispatcher` to notify registered consumers of processed data.
- Subscribes to real-time data channels specific to OKX.
- Processes and standardizes incoming OKX data before dispatching.
- Stores processed data in the database.
## Configuration