412 lines
14 KiB
Markdown
Raw Permalink Normal View History

# Unified Logging System
The TCP Dashboard project uses a unified logging system built on Python's standard `logging` library. It provides consistent, centralized logging across all components.
2025-06-06 20:33:29 +08:00
## Key Features
- **Component-based Logging**: Each component (e.g., `bot_manager`, `data_collector`) gets its own dedicated logger, with logs organized into separate directories under `logs/`.
- **Standardized & Simple**: Relies on standard Python `logging` handlers, making it robust and easy to maintain.
- **Date-based Rotation**: Log files are automatically rotated daily at midnight by `TimedRotatingFileHandler`.
- **Automatic Cleanup**: Log file retention is managed automatically based on the number of backup files to keep (`backupCount`), preventing excessive disk usage.
- **Unified Format**: All log messages follow a detailed, consistent format: `[YYYY-MM-DD HH:MM:SS - LEVEL - pathname:lineno - funcName] - message]`.
- **Configurable Console Output**: Optional console output for real-time monitoring, configurable via function arguments or environment variables.
2025-06-06 20:33:29 +08:00
## Usage
### Getting a Logger
The primary way to get a logger is via the `get_logger` function. It is thread-safe and ensures that loggers are configured only once.
```python
from utils.logger import get_logger
# Get a logger for the bot manager component
# This will create a file logger and, if verbose=True, a console logger.
logger = get_logger('bot_manager', verbose=True)
2025-06-03 12:08:43 +08:00
logger.info("Bot started successfully")
logger.debug("Connecting to database...")
logger.warning("API response time is high")
logger.error("Failed to execute trade", exc_info=True)
2025-06-03 12:08:43 +08:00
```
### Configuration
The `get_logger` function accepts the following parameters:
| Parameter | Type | Default | Description |
|-------------------|---------------------|---------|-----------------------------------------------------------------------------|
| `component_name` | `str` | `default_logger` | Name of the component (e.g., `bot_manager`). Used for the logger name and directory. |
| `log_level` | `str` | `INFO` | The minimum logging level to be processed (DEBUG, INFO, WARNING, ERROR, CRITICAL). |
| `verbose` | `Optional[bool]` | `None` | If `True`, enables console logging. If `None`, uses `VERBOSE_LOGGING` or `LOG_TO_CONSOLE` from environment variables. |
| `max_log_files` | `int` | `30` | The maximum number of backup log files to keep. The core of the log cleanup mechanism. |
| `clean_old_logs` | `bool` | `True` | **Deprecated**. Kept for backward compatibility but has no effect. Cleanup is controlled by `max_log_files`. |
For centralized control, you can use environment variables:
- `VERBOSE_LOGGING`: Set to `true` to enable console logging for all loggers.
- `LOG_TO_CONSOLE`: An alias for `VERBOSE_LOGGING`.
### Log File Structure
The logger creates a directory for each component inside `logs/`. The main log file is named `component_name.log`. When rotated, old logs are renamed with a date suffix.
```
logs/
├── bot_manager/
│ ├── bot_manager.log (current log file)
│ └── bot_manager.log.2023-11-15
├── data_collector/
│ ├── data_collector.log
│ └── data_collector.log.2023-11-15
└── default_logger/
└── default_logger.log
└── test_component/
└── test_component.log
```
## Advanced Usage
### Age-Based Log Cleanup
While the primary cleanup mechanism is count-based (via `max_log_files`), a separate utility function, `cleanup_old_logs`, is available for age-based cleanup if you have specific retention policies.
```python
from utils.logger import cleanup_old_logs
# Deletes all log files in the 'bot_manager' directory older than 15 days
cleanup_old_logs('bot_manager', days_to_keep=15)
```
### Shutting Down Logging
In some cases, especially in tests or when an application is shutting down gracefully, you may need to explicitly close all log file handlers.
```python
from utils.logger import shutdown_logging
# Closes all open file handlers managed by the logging system
shutdown_logging()
```
## Component Integration Pattern (Conditional Logging)
While the logger utility is simple, it is designed to support a powerful conditional logging pattern at the application level. This allows components to be developed to run with or without logging, making them more flexible and easier to test.
### Key Concepts
1. **Optional Logging**: Components are designed to accept `logger=None` in their constructor and function normally without producing any logs.
2. **Error-Only Mode**: A component can be designed to only log messages of level `ERROR` or higher. This is a component-level implementation pattern, not a feature of `get_logger`.
3. **Logger Inheritance**: Parent components can pass their logger instance to child components, ensuring a consistent logging context.
### Example: Component Constructor
All major components should follow this pattern to support conditional logging.
```python
class ComponentExample:
def __init__(self, logger=None, log_errors_only=False):
self.logger = logger
self.log_errors_only = log_errors_only
2025-06-03 12:08:43 +08:00
def _log_info(self, message: str) -> None:
"""Log info message if logger is available and not in errors-only mode."""
if self.logger and not self.log_errors_only:
self.logger.info(message)
2025-06-03 12:08:43 +08:00
def _log_error(self, message: str, exc_info: bool = False) -> None:
"""Log error message if logger is available."""
2025-06-03 12:08:43 +08:00
if self.logger:
self.logger.error(message, exc_info=exc_info)
# ... other helper methods for debug, warning, critical ...
```
This pattern decouples the component's logic from the global logging configuration and makes its logging behavior explicit and easy to manage.
## Troubleshooting
- **Permissions**: Ensure the application has write permissions to the `logs/` directory.
- **No Logs**: If file logging fails (e.g., due to permissions), a warning is printed to the console. If `verbose` is not enabled, no further logs will be produced. Ensure the `logs/` directory is writable.
- **Console Spam**: If the console is too noisy, set `verbose=False` when calling `get_logger` and ensure `VERBOSE_LOGGING` is not set to `true` in your environment.
## Best Practices
### 1. Component Naming
Use descriptive, consistent component names:
- `bot_manager` - for bot lifecycle management
- `data_collector` - for market data collection
- `strategies` - for trading strategies
- `backtesting` - for backtesting engine
- `dashboard` - for web dashboard
### 2. Log Level Guidelines
- **DEBUG**: Detailed diagnostic information, typically only of interest when diagnosing problems
- **INFO**: General information about program execution
- **WARNING**: Something unexpected happened, but the program is still working
- **ERROR**: A serious problem occurred, the program couldn't perform a function
- **CRITICAL**: A serious error occurred, the program may not be able to continue
### 3. Verbose Logging Guidelines
```python
# Development: Use verbose logging with DEBUG level
dev_logger = get_logger('component', 'DEBUG', verbose=True, max_log_files=3)
# Production: Use INFO level with no console output
prod_logger = get_logger('component', 'INFO', verbose=False, max_log_files=30)
# Testing: Disable cleanup to preserve test logs
test_logger = get_logger('test_component', 'DEBUG', verbose=True, clean_old_logs=False)
```
### 4. Log Retention Guidelines
```python
# High-frequency components (data collectors): shorter retention
data_logger = get_logger('data_collector', max_log_files=7)
# Important components (bot managers): longer retention
bot_logger = get_logger('bot_manager', max_log_files=30)
# Development: very short retention
dev_logger = get_logger('dev_component', max_log_files=3)
```
### 5. Message Content
```python
# Good: Descriptive and actionable
logger.error("Failed to connect to OKX API: timeout after 30s")
# Bad: Vague and unhelpful
logger.error("Error occurred")
# Good: Include relevant context
logger.info(f"Bot {bot_id} executed trade: {symbol} {side} {quantity}@{price}")
# Good: Include duration for performance monitoring
start_time = time.time()
# ... do work ...
duration = time.time() - start_time
logger.info(f"Data aggregation completed in {duration:.2f}s")
```
### 6. Exception Handling
```python
try:
execute_trade(symbol, quantity, price)
logger.info(f"Trade executed successfully: {symbol}")
except APIError as e:
logger.error(f"API error during trade execution: {e}", exc_info=True)
raise
except ValidationError as e:
logger.warning(f"Trade validation failed: {e}")
return False
except Exception as e:
logger.critical(f"Unexpected error during trade execution: {e}", exc_info=True)
raise
```
### 7. Performance Considerations
```python
# Good: Efficient string formatting
logger.debug(f"Processing {len(data)} records")
# Avoid: Expensive operations in log messages unless necessary
# logger.debug(f"Data: {expensive_serialization(data)}") # Only if needed
# Better: Check log level first for expensive operations
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Data: {expensive_serialization(data)}")
```
2025-06-03 12:08:43 +08:00
## Migration Guide
2025-06-03 12:08:43 +08:00
### Updating Existing Components
2025-06-03 12:08:43 +08:00
1. **Add logger parameter to constructor**:
```python
def __init__(self, ..., logger=None, log_errors_only=False):
```
2. **Add conditional logging helpers**:
```python
def _log_debug(self, message: str) -> None:
if self.logger and not self.log_errors_only:
self.logger.debug(message)
```
3. **Update all logging calls**:
```python
# Before
self.logger.info("Message")
2025-06-03 12:08:43 +08:00
# After
self._log_info("Message")
```
4. **Pass logger to child components**:
```python
child = ChildComponent(logger=self.logger)
```
2025-06-03 12:08:43 +08:00
### From Standard Logging
```python
# Old logging (if any existed)
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# New unified logging
from utils.logger import get_logger
logger = get_logger('component_name', verbose=True)
```
2025-06-03 12:08:43 +08:00
### Gradual Adoption
1. **Phase 1**: Add optional logger parameters to new components
2. **Phase 2**: Update existing components to support conditional logging
3. **Phase 3**: Implement hierarchical logging structure
4. **Phase 4**: Add error-only logging mode
## Testing
2025-06-03 12:08:43 +08:00
### Testing Conditional Logging
2025-06-03 12:08:43 +08:00
#### Test Script Example
```python
# test_conditional_logging.py
from utils.logger import get_logger
from data.collector_manager import CollectorManager
from data.exchanges.okx.collector import OKXCollector
2025-06-03 12:08:43 +08:00
def test_no_logging():
"""Test components work without loggers."""
manager = CollectorManager(logger=None)
collector = OKXCollector("BTC-USDT", logger=None)
print("✓ No logging test passed")
2025-06-03 12:08:43 +08:00
def test_with_logging():
"""Test components work with loggers."""
logger = get_logger('test_system')
manager = CollectorManager(logger=logger)
collector = OKXCollector("BTC-USDT", logger=logger)
print("✓ With logging test passed")
2025-06-03 12:08:43 +08:00
def test_error_only():
"""Test error-only logging mode."""
logger = get_logger('test_errors')
collector = OKXCollector("BTC-USDT", logger=logger, log_errors_only=True)
print("✓ Error-only logging test passed")
if __name__ == "__main__":
test_no_logging()
test_with_logging()
test_error_only()
print("✅ All conditional logging tests passed!")
```
### Testing Changes
```python
2025-06-03 12:08:43 +08:00
# Test without logger
component = MyComponent(logger=None)
# Should work without errors, no logging
2025-06-03 12:08:43 +08:00
# Test with logger
logger = get_logger('test_component')
component = MyComponent(logger=logger)
# Should log normally
# Test error-only mode
component = MyComponent(logger=logger, log_errors_only=True)
# Should only log errors
```
2025-06-03 12:08:43 +08:00
### Basic System Test
2025-06-03 12:08:43 +08:00
Run a simple test to verify the logging system:
```bash
2025-06-03 12:08:43 +08:00
python -c "from utils.logger import get_logger; logger = get_logger('test', verbose=True); logger.info('Test message'); print('Check logs/test/ directory')"
```
## Troubleshooting
### Common Issues
1. **Permission errors**: Ensure the application has write permissions to the project directory
2. **Disk space**: Monitor disk usage and adjust log retention with `max_log_files`
3. **Threading issues**: The logger is thread-safe, but check for application-level concurrency issues
4. **Too many console messages**: Adjust `verbose` parameter or log levels
### Debug Mode
Enable debug logging to troubleshoot issues:
```python
logger = get_logger('component_name', 'DEBUG', verbose=True)
```
### Console Output Issues
```python
# Force console output regardless of environment
logger = get_logger('component_name', verbose=True)
# Check environment variables
import os
print(f"VERBOSE_LOGGING: {os.getenv('VERBOSE_LOGGING')}")
print(f"LOG_TO_CONSOLE: {os.getenv('LOG_TO_CONSOLE')}")
```
### Fallback Logging
If file logging fails, the system automatically falls back to console logging with a warning message.
2025-06-03 12:08:43 +08:00
## Integration with Existing Code
The logging system is designed to be gradually adopted:
2025-06-03 12:08:43 +08:00
1. **Start with new modules**: Use the unified logger in new code
2. **Replace existing logging**: Gradually migrate existing logging to the unified system
3. **No breaking changes**: Existing code continues to work
## Maintenance
### Automatic Cleanup Benefits
The automatic cleanup feature provides several benefits:
- **Disk space management**: Prevents log directories from growing indefinitely
- **Performance**: Fewer files to scan in log directories
- **Maintenance-free**: No need for external cron jobs or scripts
- **Component-specific**: Each component can have different retention policies
### Manual Cleanup for Special Cases
For cases requiring age-based cleanup instead of count-based:
```python
# cleanup_logs.py
from utils.logger import cleanup_old_logs
components = ['bot_manager', 'data_collector', 'strategies', 'dashboard']
for component in components:
cleanup_old_logs(component, days_to_keep=30)
```
### Monitoring Disk Usage
Monitor the `logs/` directory size and adjust retention policies as needed:
```bash
# Check log directory size
du -sh logs/
# Find large log files
find logs/ -name "*.txt" -size +10M
# Count log files per component
find logs/ -name "*.txt" | cut -d'/' -f2 | sort | uniq -c
```
2025-06-03 12:08:43 +08:00
This conditional logging system provides maximum flexibility while maintaining clean, maintainable code that works in all scenarios.