208 lines
7.5 KiB
Markdown
208 lines
7.5 KiB
Markdown
# Storage Utilities
|
|
|
|
This document describes the refactored storage utilities found in `cycles/utils/` that provide modular, maintainable data and results management.
|
|
|
|
## Overview
|
|
|
|
The storage utilities have been refactored into a modular architecture with clear separation of concerns:
|
|
|
|
- **`Storage`** - Main coordinator class providing unified interface (backward compatible)
|
|
- **`DataLoader`** - Handles loading data from various file formats
|
|
- **`DataSaver`** - Manages saving data with proper format handling
|
|
- **`ResultFormatter`** - Formats and writes backtest results to CSV files
|
|
- **`storage_utils`** - Shared utilities and custom exceptions
|
|
|
|
This design improves maintainability, testability, and follows the single responsibility principle.
|
|
|
|
## Constants
|
|
|
|
- `RESULTS_DIR`: Default directory for storing results (default: "../results")
|
|
- `DATA_DIR`: Default directory for storing input data (default: "../data")
|
|
|
|
## Main Classes
|
|
|
|
### `Storage` (Coordinator Class)
|
|
|
|
The main interface that coordinates all storage operations while maintaining backward compatibility.
|
|
|
|
#### `__init__(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)`
|
|
|
|
**Description**: Initializes the Storage coordinator with component instances.
|
|
|
|
**Parameters**:
|
|
- `logging` (optional): A logging instance for outputting information
|
|
- `results_dir` (str, optional): Path to the directory for storing results
|
|
- `data_dir` (str, optional): Path to the directory for storing data
|
|
|
|
**Creates**: Component instances for DataLoader, DataSaver, and ResultFormatter
|
|
|
|
#### `load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame`
|
|
|
|
**Description**: Loads data with optimized dtypes and filtering, supporting CSV and JSON input.
|
|
|
|
**Parameters**:
|
|
- `file_path` (str): Path to the data file (relative to `data_dir`)
|
|
- `start_date` (datetime-like): The start date for filtering data
|
|
- `stop_date` (datetime-like): The end date for filtering data
|
|
|
|
**Returns**: `pandas.DataFrame` with timestamp index
|
|
|
|
**Raises**: `DataLoadingError` if loading fails
|
|
|
|
#### `save_data(self, data: pd.DataFrame, file_path: str) -> None`
|
|
|
|
**Description**: Saves processed data to a CSV file with proper timestamp handling.
|
|
|
|
**Parameters**:
|
|
- `data` (pd.DataFrame): The DataFrame to save
|
|
- `file_path` (str): Path to the data file (relative to `data_dir`)
|
|
|
|
**Raises**: `DataSavingError` if saving fails
|
|
|
|
#### `format_row(self, row: Dict[str, Any]) -> Dict[str, str]`
|
|
|
|
**Description**: Formats a dictionary row for output to results CSV files.
|
|
|
|
**Parameters**:
|
|
- `row` (dict): The row of data to format
|
|
|
|
**Returns**: `dict` with formatted values (percentages, currency, etc.)
|
|
|
|
#### `write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None`
|
|
|
|
**Description**: Writes a chunk of results to a CSV file with optional header.
|
|
|
|
**Parameters**:
|
|
- `filename` (str): The name of the file to write to
|
|
- `fieldnames` (list): CSV header/column names
|
|
- `rows` (list): List of dictionaries representing rows
|
|
- `write_header` (bool, optional): Whether to write the header
|
|
- `initial_usd` (float, optional): Initial USD value for header comment
|
|
|
|
#### `write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str`
|
|
|
|
**Description**: Writes combined backtest results to a CSV file with metadata.
|
|
|
|
**Parameters**:
|
|
- `filename` (str): Name of the file to write to (relative to `results_dir`)
|
|
- `fieldnames` (list): CSV header/column names
|
|
- `rows` (list): List of result dictionaries
|
|
- `metadata_lines` (list, optional): Header comment lines
|
|
|
|
**Returns**: Full path to the written file
|
|
|
|
#### `write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None`
|
|
|
|
**Description**: Writes trade data to separate CSV files grouped by timeframe and stop-loss.
|
|
|
|
**Parameters**:
|
|
- `all_trade_rows` (list): List of trade dictionaries
|
|
- `trades_fieldnames` (list): CSV header for trade files
|
|
|
|
**Files Created**: `trades_{timeframe}_ST{sl_percent}pct.csv` in `results_dir`
|
|
|
|
### `DataLoader`
|
|
|
|
Handles loading and preprocessing of data from various file formats.
|
|
|
|
#### Key Features:
|
|
- Supports CSV and JSON formats
|
|
- Optimized pandas dtypes for financial data
|
|
- Intelligent timestamp parsing (Unix timestamps and datetime strings)
|
|
- Date range filtering
|
|
- Column name normalization (lowercase)
|
|
- Comprehensive error handling
|
|
|
|
#### Methods:
|
|
- `load_data()` - Main loading interface
|
|
- `_load_json_data()` - JSON-specific loading logic
|
|
- `_load_csv_data()` - CSV-specific loading logic
|
|
- `_process_csv_timestamps()` - Timestamp parsing for CSV data
|
|
|
|
### `DataSaver`
|
|
|
|
Manages saving data with proper format handling and index conversion.
|
|
|
|
#### Key Features:
|
|
- Converts DatetimeIndex to Unix timestamps for CSV compatibility
|
|
- Handles numeric indexes appropriately
|
|
- Ensures 'timestamp' column is first in output
|
|
- Comprehensive error handling and logging
|
|
|
|
#### Methods:
|
|
- `save_data()` - Main saving interface
|
|
- `_prepare_data_for_saving()` - Data preparation logic
|
|
- `_convert_datetime_index_to_timestamp()` - DatetimeIndex conversion
|
|
- `_convert_numeric_index_to_timestamp()` - Numeric index conversion
|
|
|
|
### `ResultFormatter`
|
|
|
|
Handles formatting and writing of backtest results to CSV files.
|
|
|
|
#### Key Features:
|
|
- Consistent formatting for percentages and currency
|
|
- Grouped trade file writing by timeframe/stop-loss
|
|
- Metadata header support
|
|
- Tab-delimited output for results
|
|
- Error handling for all write operations
|
|
|
|
#### Methods:
|
|
- `format_row()` - Format individual result rows
|
|
- `write_results_chunk()` - Write result chunks with headers
|
|
- `write_backtest_results()` - Write combined results with metadata
|
|
- `write_trades()` - Write grouped trade files
|
|
|
|
## Utility Functions and Exceptions
|
|
|
|
### Custom Exceptions
|
|
|
|
- **`TimestampParsingError`** - Raised when timestamp parsing fails
|
|
- **`DataLoadingError`** - Raised when data loading operations fail
|
|
- **`DataSavingError`** - Raised when data saving operations fail
|
|
|
|
### Utility Functions
|
|
|
|
- **`_parse_timestamp_column()`** - Parse timestamp columns with format detection
|
|
- **`_filter_by_date_range()`** - Filter DataFrames by date range
|
|
- **`_normalize_column_names()`** - Convert column names to lowercase
|
|
|
|
## Architecture Benefits
|
|
|
|
### Separation of Concerns
|
|
- Each class has a single, well-defined responsibility
|
|
- Data loading, saving, and result formatting are cleanly separated
|
|
- Shared utilities are extracted to prevent code duplication
|
|
|
|
### Maintainability
|
|
- All files are under 250 lines (quality gate)
|
|
- All methods are under 50 lines (quality gate)
|
|
- Clear interfaces and comprehensive documentation
|
|
- Type hints for better IDE support and clarity
|
|
|
|
### Error Handling
|
|
- Custom exceptions for different error types
|
|
- Consistent error logging patterns
|
|
- Graceful degradation (empty DataFrames on load failure)
|
|
|
|
### Backward Compatibility
|
|
- Storage class maintains exact same public interface
|
|
- All existing code continues to work unchanged
|
|
- Component classes are available for advanced usage
|
|
|
|
## Migration Notes
|
|
|
|
The refactoring maintains full backward compatibility. Existing code using `Storage` will continue to work unchanged. For new code, consider using the component classes directly for more focused functionality:
|
|
|
|
```python
|
|
# Existing pattern (still works)
|
|
from cycles.utils.storage import Storage
|
|
storage = Storage(logging=logger)
|
|
data = storage.load_data('file.csv', start, end)
|
|
|
|
# New pattern for focused usage
|
|
from cycles.utils.data_loader import DataLoader
|
|
loader = DataLoader(data_dir, logger)
|
|
data = loader.load_data('file.csv', start, end)
|
|
```
|
|
|