Cycles/docs/utils_storage.md

208 lines
7.5 KiB
Markdown

# Storage Utilities
This document describes the refactored storage utilities found in `cycles/utils/` that provide modular, maintainable data and results management.
## Overview
The storage utilities have been refactored into a modular architecture with clear separation of concerns:
- **`Storage`** - Main coordinator class providing unified interface (backward compatible)
- **`DataLoader`** - Handles loading data from various file formats
- **`DataSaver`** - Manages saving data with proper format handling
- **`ResultFormatter`** - Formats and writes backtest results to CSV files
- **`storage_utils`** - Shared utilities and custom exceptions
This design improves maintainability, testability, and follows the single responsibility principle.
## Constants
- `RESULTS_DIR`: Default directory for storing results (default: "../results")
- `DATA_DIR`: Default directory for storing input data (default: "../data")
## Main Classes
### `Storage` (Coordinator Class)
The main interface that coordinates all storage operations while maintaining backward compatibility.
#### `__init__(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)`
**Description**: Initializes the Storage coordinator with component instances.
**Parameters**:
- `logging` (optional): A logging instance for outputting information
- `results_dir` (str, optional): Path to the directory for storing results
- `data_dir` (str, optional): Path to the directory for storing data
**Creates**: Component instances for DataLoader, DataSaver, and ResultFormatter
#### `load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame`
**Description**: Loads data with optimized dtypes and filtering, supporting CSV and JSON input.
**Parameters**:
- `file_path` (str): Path to the data file (relative to `data_dir`)
- `start_date` (datetime-like): The start date for filtering data
- `stop_date` (datetime-like): The end date for filtering data
**Returns**: `pandas.DataFrame` with timestamp index
**Raises**: `DataLoadingError` if loading fails
#### `save_data(self, data: pd.DataFrame, file_path: str) -> None`
**Description**: Saves processed data to a CSV file with proper timestamp handling.
**Parameters**:
- `data` (pd.DataFrame): The DataFrame to save
- `file_path` (str): Path to the data file (relative to `data_dir`)
**Raises**: `DataSavingError` if saving fails
#### `format_row(self, row: Dict[str, Any]) -> Dict[str, str]`
**Description**: Formats a dictionary row for output to results CSV files.
**Parameters**:
- `row` (dict): The row of data to format
**Returns**: `dict` with formatted values (percentages, currency, etc.)
#### `write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None`
**Description**: Writes a chunk of results to a CSV file with optional header.
**Parameters**:
- `filename` (str): The name of the file to write to
- `fieldnames` (list): CSV header/column names
- `rows` (list): List of dictionaries representing rows
- `write_header` (bool, optional): Whether to write the header
- `initial_usd` (float, optional): Initial USD value for header comment
#### `write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str`
**Description**: Writes combined backtest results to a CSV file with metadata.
**Parameters**:
- `filename` (str): Name of the file to write to (relative to `results_dir`)
- `fieldnames` (list): CSV header/column names
- `rows` (list): List of result dictionaries
- `metadata_lines` (list, optional): Header comment lines
**Returns**: Full path to the written file
#### `write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None`
**Description**: Writes trade data to separate CSV files grouped by timeframe and stop-loss.
**Parameters**:
- `all_trade_rows` (list): List of trade dictionaries
- `trades_fieldnames` (list): CSV header for trade files
**Files Created**: `trades_{timeframe}_ST{sl_percent}pct.csv` in `results_dir`
### `DataLoader`
Handles loading and preprocessing of data from various file formats.
#### Key Features:
- Supports CSV and JSON formats
- Optimized pandas dtypes for financial data
- Intelligent timestamp parsing (Unix timestamps and datetime strings)
- Date range filtering
- Column name normalization (lowercase)
- Comprehensive error handling
#### Methods:
- `load_data()` - Main loading interface
- `_load_json_data()` - JSON-specific loading logic
- `_load_csv_data()` - CSV-specific loading logic
- `_process_csv_timestamps()` - Timestamp parsing for CSV data
### `DataSaver`
Manages saving data with proper format handling and index conversion.
#### Key Features:
- Converts DatetimeIndex to Unix timestamps for CSV compatibility
- Handles numeric indexes appropriately
- Ensures 'timestamp' column is first in output
- Comprehensive error handling and logging
#### Methods:
- `save_data()` - Main saving interface
- `_prepare_data_for_saving()` - Data preparation logic
- `_convert_datetime_index_to_timestamp()` - DatetimeIndex conversion
- `_convert_numeric_index_to_timestamp()` - Numeric index conversion
### `ResultFormatter`
Handles formatting and writing of backtest results to CSV files.
#### Key Features:
- Consistent formatting for percentages and currency
- Grouped trade file writing by timeframe/stop-loss
- Metadata header support
- Tab-delimited output for results
- Error handling for all write operations
#### Methods:
- `format_row()` - Format individual result rows
- `write_results_chunk()` - Write result chunks with headers
- `write_backtest_results()` - Write combined results with metadata
- `write_trades()` - Write grouped trade files
## Utility Functions and Exceptions
### Custom Exceptions
- **`TimestampParsingError`** - Raised when timestamp parsing fails
- **`DataLoadingError`** - Raised when data loading operations fail
- **`DataSavingError`** - Raised when data saving operations fail
### Utility Functions
- **`_parse_timestamp_column()`** - Parse timestamp columns with format detection
- **`_filter_by_date_range()`** - Filter DataFrames by date range
- **`_normalize_column_names()`** - Convert column names to lowercase
## Architecture Benefits
### Separation of Concerns
- Each class has a single, well-defined responsibility
- Data loading, saving, and result formatting are cleanly separated
- Shared utilities are extracted to prevent code duplication
### Maintainability
- All files are under 250 lines (quality gate)
- All methods are under 50 lines (quality gate)
- Clear interfaces and comprehensive documentation
- Type hints for better IDE support and clarity
### Error Handling
- Custom exceptions for different error types
- Consistent error logging patterns
- Graceful degradation (empty DataFrames on load failure)
### Backward Compatibility
- Storage class maintains exact same public interface
- All existing code continues to work unchanged
- Component classes are available for advanced usage
## Migration Notes
The refactoring maintains full backward compatibility. Existing code using `Storage` will continue to work unchanged. For new code, consider using the component classes directly for more focused functionality:
```python
# Existing pattern (still works)
from cycles.utils.storage import Storage
storage = Storage(logging=logger)
data = storage.load_data('file.csv', start, end)
# New pattern for focused usage
from cycles.utils.data_loader import DataLoader
loader = DataLoader(data_dir, logger)
data = loader.load_data('file.csv', start, end)
```