Add complete time series aggregation example and refactor OKXCollector for repository pattern

- Introduced `example_complete_series_aggregation.py` to demonstrate time series aggregation, emitting candles even when no trades occur.
- Implemented `CompleteSeriesProcessor` extending `RealTimeCandleProcessor` to handle time-based candle emission and empty candle creation.
- Refactored `OKXCollector` to utilize the new repository pattern for database operations, enhancing modularity and maintainability.
- Updated database operations to centralize data handling through `DatabaseOperations`, improving error handling and logging.
- Enhanced documentation to include details on the new aggregation example and repository pattern implementation, ensuring clarity for users.
This commit is contained in:
Vasily.onl
2025-06-02 13:27:01 +08:00
parent 5b4547edd5
commit cffc54b648
11 changed files with 1460 additions and 149 deletions

View File

@@ -194,6 +194,70 @@ def calculate_performance_metrics(portfolio_values: List[float]) -> dict:
6. **Virtual Trading**: Simulation-first approach with fee modeling
7. **Simplified Architecture**: Monolithic design with clear component boundaries for future scaling
## Repository Pattern for Database Operations
### Database Abstraction Layer
The system uses the **Repository Pattern** to abstract database operations from business logic, providing a clean, maintainable, and testable interface for all data access.
```python
# Centralized database operations
from database.operations import get_database_operations
class DataCollector:
def __init__(self):
# Use repository pattern instead of direct SQL
self.db = get_database_operations()
def store_candle(self, candle: OHLCVCandle):
"""Store candle using repository pattern"""
success = self.db.market_data.upsert_candle(candle, force_update=False)
def store_raw_trade(self, data_point: MarketDataPoint):
"""Store raw trade data using repository pattern"""
success = self.db.raw_trades.insert_market_data_point(data_point)
```
### Repository Structure
```python
# Clean API for database operations
class DatabaseOperations:
def __init__(self):
self.market_data = MarketDataRepository() # Candle operations
self.raw_trades = RawTradeRepository() # Raw data operations
def health_check(self) -> bool:
"""Check database connection health"""
def get_stats(self) -> dict:
"""Get database statistics and metrics"""
class MarketDataRepository:
def upsert_candle(self, candle: OHLCVCandle, force_update: bool = False) -> bool:
"""Store or update candle with duplicate handling"""
def get_candles(self, symbol: str, timeframe: str, start: datetime, end: datetime) -> List[dict]:
"""Retrieve historical candle data"""
def get_latest_candle(self, symbol: str, timeframe: str) -> Optional[dict]:
"""Get most recent candle for symbol/timeframe"""
class RawTradeRepository:
def insert_market_data_point(self, data_point: MarketDataPoint) -> bool:
"""Store raw WebSocket data"""
def get_raw_trades(self, symbol: str, data_type: str, start: datetime, end: datetime) -> List[dict]:
"""Retrieve raw trade data for analysis"""
```
### Benefits of Repository Pattern
- **No Raw SQL**: Business logic never contains direct SQL queries
- **Centralized Operations**: All database interactions go through well-defined APIs
- **Easy Testing**: Repository methods can be easily mocked for unit tests
- **Database Agnostic**: Can change database implementations without affecting business logic
- **Automatic Transaction Management**: Sessions, commits, and rollbacks handled automatically
- **Consistent Error Handling**: Custom exceptions with proper context
- **Type Safety**: Full type hints for better IDE support and error detection
## Database Architecture
### Core Tables

View File

@@ -17,6 +17,18 @@ This section contains detailed technical documentation for all system components
- Integration examples and patterns
- Comprehensive troubleshooting guide
### Database Operations
- **[Database Operations](database_operations.md)** - *Repository pattern for clean database interactions*
- **Repository Pattern** implementation for data access abstraction
- **MarketDataRepository** for candle/OHLCV operations
- **RawTradeRepository** for WebSocket data storage
- Automatic transaction management and session cleanup
- Configurable duplicate handling with force update options
- Custom error handling with DatabaseOperationError
- Database health monitoring and performance statistics
- Migration guide from direct SQL to repository pattern
### Logging & Monitoring
- **[Enhanced Logging System](logging.md)** - *Unified logging framework*

View File

@@ -31,6 +31,17 @@ The Data Collector System provides a robust, scalable framework for collecting r
- **Logging Integration**: Enhanced logging with configurable verbosity
- **Multi-Timeframe Support**: Sub-second to daily candle aggregation (1s, 5s, 10s, 15s, 30s, 1m, 5m, 15m, 1h, 4h, 1d)
### 🛢️ **Database Integration**
- **Repository Pattern**: All database operations use the centralized `database/operations.py` module
- **No Raw SQL**: Clean API through `MarketDataRepository` and `RawTradeRepository` classes
- **Automatic Transaction Management**: Sessions, commits, and rollbacks handled automatically
- **Configurable Duplicate Handling**: `force_update_candles` parameter controls duplicate behavior
- **Real-time Storage**: Completed candles automatically saved to `market_data` table
- **Raw Data Storage**: Optional raw WebSocket data storage via `RawTradeRepository`
- **Custom Error Handling**: Proper exception handling with `DatabaseOperationError`
- **Health Monitoring**: Built-in database health checks and statistics
- **Connection Pooling**: Efficient database connection management through repositories
## Architecture
```
@@ -233,26 +244,26 @@ The `get_status()` method returns comprehensive status information:
{
'exchange': 'okx',
'status': 'running', # Current status
'should_be_running': True, # Desired state
'symbols': ['BTC-USDT', 'ETH-USDT'], # Configured symbols
'data_types': ['ticker'], # Data types being collected
'auto_restart': True, # Auto-restart enabled
'should_be_running': True, # Desired state
'symbols': ['BTC-USDT', 'ETH-USDT'], # Configured symbols
'data_types': ['ticker'], # Data types being collected
'auto_restart': True, # Auto-restart enabled
'health': {
'time_since_heartbeat': 5.2, # Seconds since last heartbeat
'time_since_data': 2.1, # Seconds since last data
'max_silence_duration': 300.0 # Max allowed silence
'time_since_heartbeat': 5.2, # Seconds since last heartbeat
'time_since_data': 2.1, # Seconds since last data
'max_silence_duration': 300.0 # Max allowed silence
},
'statistics': {
'messages_received': 1250, # Total messages received
'messages_processed': 1248, # Successfully processed
'errors': 2, # Error count
'restarts': 1, # Restart count
'uptime_seconds': 3600.5, # Current uptime
'reconnect_attempts': 0, # Current reconnect attempts
'last_message_time': '2023-...', # ISO timestamp
'connection_uptime': '2023-...', # Connection start time
'last_error': 'Connection failed', # Last error message
'last_restart_time': '2023-...' # Last restart time
'messages_received': 1250, # Total messages received
'messages_processed': 1248, # Successfully processed
'errors': 2, # Error count
'restarts': 1, # Restart count
'uptime_seconds': 3600.5, # Current uptime
'reconnect_attempts': 0, # Current reconnect attempts
'last_message_time': '2023-...', # ISO timestamp
'connection_uptime': '2023-...', # Connection start time
'last_error': 'Connection failed', # Last error message
'last_restart_time': '2023-...' # Last restart time
}
}
```
@@ -263,13 +274,13 @@ The `get_health_status()` method provides detailed health information:
```python
{
'is_healthy': True, # Overall health status
'issues': [], # List of current issues
'status': 'running', # Current collector status
'last_heartbeat': '2023-...', # Last heartbeat timestamp
'last_data_received': '2023-...', # Last data timestamp
'should_be_running': True, # Expected state
'is_running': True # Actual running state
'is_healthy': True, # Overall health status
'issues': [], # List of current issues
'status': 'running', # Current collector status
'last_heartbeat': '2023-...', # Last heartbeat timestamp
'last_data_received': '2023-...', # Last data timestamp
'should_be_running': True, # Expected state
'is_running': True # Actual running state
}
```

View File

@@ -0,0 +1,437 @@
# Database Operations Documentation
## Overview
The Database Operations module (`database/operations.py`) provides a clean, centralized interface for all database interactions using the **Repository Pattern**. This approach abstracts SQL complexity from business logic, ensuring maintainable, testable, and consistent database operations across the entire application.
## Key Benefits
### 🏗️ **Clean Architecture**
- **Repository Pattern**: Separates data access logic from business logic
- **Centralized Operations**: All database interactions go through well-defined APIs
- **No Raw SQL**: Business logic never contains direct SQL queries
- **Consistent Interface**: Standardized methods across all database operations
### 🛡️ **Reliability & Safety**
- **Automatic Transaction Management**: Sessions and commits handled automatically
- **Error Handling**: Custom exceptions with proper context
- **Connection Pooling**: Efficient database connection management
- **Session Cleanup**: Automatic session management and cleanup
### 🔧 **Maintainability**
- **Easy Testing**: Repository methods can be easily mocked for testing
- **Database Agnostic**: Can change database implementations without affecting business logic
- **Type Safety**: Full type hints for better IDE support and error detection
- **Logging Integration**: Built-in logging for monitoring and debugging
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ DatabaseOperations │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Health Check & Stats │ │
│ │ • Connection health monitoring │ │
│ │ • Database statistics │ │
│ │ • Performance metrics │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐ │
│ │MarketDataRepo │ │RawTradeRepo │ │ Future │ │
│ │ │ │ │ │ Repositories │ │
│ │ • upsert_candle │ │ • insert_data │ │ • OrderBook │ │
│ │ • get_candles │ │ • get_trades │ │ • UserTrades │ │
│ │ • get_latest │ │ • raw_websocket │ │ • Positions │ │
│ └─────────────────┘ └─────────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ BaseRepository │
│ │
│ • Session Mgmt │
│ • Error Logging │
│ • DB Connection │
└─────────────────┘
```
## Quick Start
### Basic Usage
```python
from database.operations import get_database_operations
from data.common.data_types import OHLCVCandle
from datetime import datetime, timezone
# Get the database operations instance (singleton)
db = get_database_operations()
# Check database health
if not db.health_check():
print("Database connection issue!")
return
# Store a candle
candle = OHLCVCandle(
exchange="okx",
symbol="BTC-USDT",
timeframe="5s",
open=50000.0,
high=50100.0,
low=49900.0,
close=50050.0,
volume=1.5,
trade_count=25,
start_time=datetime(2024, 1, 1, 12, 0, 0, tzinfo=timezone.utc),
end_time=datetime(2024, 1, 1, 12, 0, 5, tzinfo=timezone.utc)
)
# Store candle (with duplicate handling)
success = db.market_data.upsert_candle(candle, force_update=False)
if success:
print("Candle stored successfully!")
```
### With Data Collectors
```python
import asyncio
from data.exchanges.okx import OKXCollector
from data.base_collector import DataType
from database.operations import get_database_operations
async def main():
# Initialize database operations
db = get_database_operations()
# The collector automatically uses the database operations module
collector = OKXCollector(
symbols=['BTC-USDT'],
data_types=[DataType.TRADE],
store_raw_data=True, # Stores raw WebSocket data
force_update_candles=False # Ignore duplicate candles
)
await collector.start()
await asyncio.sleep(60) # Collect for 1 minute
await collector.stop()
# Check statistics
stats = db.get_stats()
print(f"Total candles: {stats['candle_count']}")
print(f"Total raw trades: {stats['raw_trade_count']}")
asyncio.run(main())
```
## API Reference
### DatabaseOperations
Main entry point for all database operations.
#### Methods
##### `health_check() -> bool`
Test database connection health.
```python
db = get_database_operations()
if db.health_check():
print("✅ Database is healthy")
else:
print("❌ Database connection issues")
```
##### `get_stats() -> Dict[str, Any]`
Get comprehensive database statistics.
```python
stats = db.get_stats()
print(f"Candles: {stats['candle_count']:,}")
print(f"Raw trades: {stats['raw_trade_count']:,}")
print(f"Health: {stats['healthy']}")
```
### MarketDataRepository
Repository for `market_data` table operations (candles/OHLCV data).
#### Methods
##### `upsert_candle(candle: OHLCVCandle, force_update: bool = False) -> bool`
Store or update candle data with configurable duplicate handling.
**Parameters:**
- `candle`: OHLCVCandle object to store
- `force_update`: If True, overwrites existing data; if False, ignores duplicates
**Returns:** True if successful, False otherwise
**Duplicate Handling:**
- `force_update=False`: Uses `ON CONFLICT DO NOTHING` (preserves existing candles)
- `force_update=True`: Uses `ON CONFLICT DO UPDATE SET` (overwrites existing candles)
```python
# Store new candle, ignore if duplicate exists
db.market_data.upsert_candle(candle, force_update=False)
# Store candle, overwrite if duplicate exists
db.market_data.upsert_candle(candle, force_update=True)
```
##### `get_candles(symbol: str, timeframe: str, start_time: datetime, end_time: datetime, exchange: str = "okx") -> List[Dict[str, Any]]`
Retrieve historical candle data.
```python
from datetime import datetime, timezone
candles = db.market_data.get_candles(
symbol="BTC-USDT",
timeframe="5s",
start_time=datetime(2024, 1, 1, 12, 0, 0, tzinfo=timezone.utc),
end_time=datetime(2024, 1, 1, 13, 0, 0, tzinfo=timezone.utc),
exchange="okx"
)
for candle in candles:
print(f"{candle['timestamp']}: O={candle['open']} H={candle['high']} L={candle['low']} C={candle['close']}")
```
##### `get_latest_candle(symbol: str, timeframe: str, exchange: str = "okx") -> Optional[Dict[str, Any]]`
Get the most recent candle for a symbol/timeframe combination.
```python
latest = db.market_data.get_latest_candle("BTC-USDT", "5s")
if latest:
print(f"Latest 5s candle: {latest['close']} at {latest['timestamp']}")
else:
print("No candles found")
```
### RawTradeRepository
Repository for `raw_trades` table operations (raw WebSocket data).
#### Methods
##### `insert_market_data_point(data_point: MarketDataPoint) -> bool`
Store raw market data from WebSocket streams.
```python
from data.base_collector import MarketDataPoint, DataType
from datetime import datetime, timezone
data_point = MarketDataPoint(
exchange="okx",
symbol="BTC-USDT",
timestamp=datetime.now(timezone.utc),
data_type=DataType.TRADE,
data={"price": 50000, "size": 0.1, "side": "buy"}
)
success = db.raw_trades.insert_market_data_point(data_point)
```
##### `insert_raw_websocket_data(exchange: str, symbol: str, data_type: str, raw_data: Dict[str, Any], timestamp: Optional[datetime] = None) -> bool`
Store raw WebSocket data for debugging purposes.
```python
db.raw_trades.insert_raw_websocket_data(
exchange="okx",
symbol="BTC-USDT",
data_type="raw_trade",
raw_data={"instId": "BTC-USDT", "px": "50000", "sz": "0.1"},
timestamp=datetime.now(timezone.utc)
)
```
##### `get_raw_trades(symbol: str, data_type: str, start_time: datetime, end_time: datetime, exchange: str = "okx", limit: Optional[int] = None) -> List[Dict[str, Any]]`
Retrieve raw trade data for analysis.
```python
trades = db.raw_trades.get_raw_trades(
symbol="BTC-USDT",
data_type="trade",
start_time=datetime(2024, 1, 1, 12, 0, 0, tzinfo=timezone.utc),
end_time=datetime(2024, 1, 1, 13, 0, 0, tzinfo=timezone.utc),
limit=1000
)
```
## Error Handling
The database operations module includes comprehensive error handling with custom exceptions.
### DatabaseOperationError
Custom exception for database operation failures.
```python
from database.operations import DatabaseOperationError
try:
db.market_data.upsert_candle(candle)
except DatabaseOperationError as e:
logger.error(f"Database operation failed: {e}")
# Handle the error appropriately
```
### Best Practices
1. **Always Handle Exceptions**: Wrap database operations in try-catch blocks
2. **Check Health First**: Use `health_check()` before critical operations
3. **Monitor Performance**: Use `get_stats()` to monitor database growth
4. **Use Appropriate Repositories**: Use `market_data` for candles, `raw_trades` for raw data
5. **Handle Duplicates Appropriately**: Choose the right `force_update` setting
## Configuration
### Force Update Behavior
The `force_update_candles` parameter in collectors controls duplicate handling:
```python
# In OKX collector configuration
collector = OKXCollector(
symbols=['BTC-USDT'],
force_update_candles=False # Default: ignore duplicates
)
# Or enable force updates
collector = OKXCollector(
symbols=['BTC-USDT'],
force_update_candles=True # Overwrite existing candles
)
```
### Logging Integration
Database operations automatically integrate with the application's logging system:
```python
import logging
from database.operations import get_database_operations
logger = logging.getLogger(__name__)
db = get_database_operations(logger)
# All database operations will now log through your logger
db.market_data.upsert_candle(candle) # Logs: "Stored candle: BTC-USDT 5s at ..."
```
## Migration from Direct SQL
If you have existing code using direct SQL, here's how to migrate:
### Before (Direct SQL - ❌ Don't do this)
```python
# OLD WAY - direct SQL queries
from database.connection import get_db_manager
from sqlalchemy import text
db_manager = get_db_manager()
with db_manager.get_session() as session:
session.execute(text("""
INSERT INTO market_data (exchange, symbol, timeframe, ...)
VALUES (:exchange, :symbol, :timeframe, ...)
"""), {...})
session.commit()
```
### After (Repository Pattern - ✅ Correct way)
```python
# NEW WAY - using repository pattern
from database.operations import get_database_operations
db = get_database_operations()
success = db.market_data.upsert_candle(candle)
```
## Performance Considerations
### Connection Pooling
The database operations module automatically manages connection pooling through the underlying `DatabaseManager`.
### Batch Operations
For high-throughput scenarios, consider batching operations:
```python
# Store multiple candles efficiently
candles = [candle1, candle2, candle3, ...]
for candle in candles:
db.market_data.upsert_candle(candle)
```
### Monitoring
Monitor database performance using the built-in statistics:
```python
import time
# Monitor database load
while True:
stats = db.get_stats()
print(f"Candles: {stats['candle_count']:,}, Health: {stats['healthy']}")
time.sleep(30)
```
## Troubleshooting
### Common Issues
#### 1. Connection Errors
```python
if not db.health_check():
logger.error("Database connection failed - check connection settings")
```
#### 2. Duplicate Key Errors
```python
# Use force_update=False to ignore duplicates
db.market_data.upsert_candle(candle, force_update=False)
```
#### 3. Transaction Errors
The repository automatically handles session management, but if you encounter issues:
```python
try:
db.market_data.upsert_candle(candle)
except DatabaseOperationError as e:
logger.error(f"Transaction failed: {e}")
```
### Debug Mode
Enable database query logging for debugging:
```python
# Set environment variable
import os
os.environ['DEBUG'] = 'true'
# This will log all SQL queries
db = get_database_operations()
```
## Related Documentation
- **[Database Connection](../architecture/database.md)** - Connection pooling and configuration
- **[Data Collectors](data_collectors.md)** - How collectors use database operations
- **[Architecture Overview](../architecture/architecture.md)** - System design patterns
---
*This documentation covers the repository pattern implementation in `database/operations.py`. For database schema details, see the [Architecture Documentation](../architecture/).*