# Storage Utilities This document describes the refactored storage utilities found in `cycles/utils/` that provide modular, maintainable data and results management. ## Overview The storage utilities have been refactored into a modular architecture with clear separation of concerns: - **`Storage`** - Main coordinator class providing unified interface (backward compatible) - **`DataLoader`** - Handles loading data from various file formats - **`DataSaver`** - Manages saving data with proper format handling - **`ResultFormatter`** - Formats and writes backtest results to CSV files - **`storage_utils`** - Shared utilities and custom exceptions This design improves maintainability, testability, and follows the single responsibility principle. ## Constants - `RESULTS_DIR`: Default directory for storing results (default: "../results") - `DATA_DIR`: Default directory for storing input data (default: "../data") ## Main Classes ### `Storage` (Coordinator Class) The main interface that coordinates all storage operations while maintaining backward compatibility. #### `__init__(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)` **Description**: Initializes the Storage coordinator with component instances. **Parameters**: - `logging` (optional): A logging instance for outputting information - `results_dir` (str, optional): Path to the directory for storing results - `data_dir` (str, optional): Path to the directory for storing data **Creates**: Component instances for DataLoader, DataSaver, and ResultFormatter #### `load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame` **Description**: Loads data with optimized dtypes and filtering, supporting CSV and JSON input. **Parameters**: - `file_path` (str): Path to the data file (relative to `data_dir`) - `start_date` (datetime-like): The start date for filtering data - `stop_date` (datetime-like): The end date for filtering data **Returns**: `pandas.DataFrame` with timestamp index **Raises**: `DataLoadingError` if loading fails #### `save_data(self, data: pd.DataFrame, file_path: str) -> None` **Description**: Saves processed data to a CSV file with proper timestamp handling. **Parameters**: - `data` (pd.DataFrame): The DataFrame to save - `file_path` (str): Path to the data file (relative to `data_dir`) **Raises**: `DataSavingError` if saving fails #### `format_row(self, row: Dict[str, Any]) -> Dict[str, str]` **Description**: Formats a dictionary row for output to results CSV files. **Parameters**: - `row` (dict): The row of data to format **Returns**: `dict` with formatted values (percentages, currency, etc.) #### `write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None` **Description**: Writes a chunk of results to a CSV file with optional header. **Parameters**: - `filename` (str): The name of the file to write to - `fieldnames` (list): CSV header/column names - `rows` (list): List of dictionaries representing rows - `write_header` (bool, optional): Whether to write the header - `initial_usd` (float, optional): Initial USD value for header comment #### `write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str` **Description**: Writes combined backtest results to a CSV file with metadata. **Parameters**: - `filename` (str): Name of the file to write to (relative to `results_dir`) - `fieldnames` (list): CSV header/column names - `rows` (list): List of result dictionaries - `metadata_lines` (list, optional): Header comment lines **Returns**: Full path to the written file #### `write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None` **Description**: Writes trade data to separate CSV files grouped by timeframe and stop-loss. **Parameters**: - `all_trade_rows` (list): List of trade dictionaries - `trades_fieldnames` (list): CSV header for trade files **Files Created**: `trades_{timeframe}_ST{sl_percent}pct.csv` in `results_dir` ### `DataLoader` Handles loading and preprocessing of data from various file formats. #### Key Features: - Supports CSV and JSON formats - Optimized pandas dtypes for financial data - Intelligent timestamp parsing (Unix timestamps and datetime strings) - Date range filtering - Column name normalization (lowercase) - Comprehensive error handling #### Methods: - `load_data()` - Main loading interface - `_load_json_data()` - JSON-specific loading logic - `_load_csv_data()` - CSV-specific loading logic - `_process_csv_timestamps()` - Timestamp parsing for CSV data ### `DataSaver` Manages saving data with proper format handling and index conversion. #### Key Features: - Converts DatetimeIndex to Unix timestamps for CSV compatibility - Handles numeric indexes appropriately - Ensures 'timestamp' column is first in output - Comprehensive error handling and logging #### Methods: - `save_data()` - Main saving interface - `_prepare_data_for_saving()` - Data preparation logic - `_convert_datetime_index_to_timestamp()` - DatetimeIndex conversion - `_convert_numeric_index_to_timestamp()` - Numeric index conversion ### `ResultFormatter` Handles formatting and writing of backtest results to CSV files. #### Key Features: - Consistent formatting for percentages and currency - Grouped trade file writing by timeframe/stop-loss - Metadata header support - Tab-delimited output for results - Error handling for all write operations #### Methods: - `format_row()` - Format individual result rows - `write_results_chunk()` - Write result chunks with headers - `write_backtest_results()` - Write combined results with metadata - `write_trades()` - Write grouped trade files ## Utility Functions and Exceptions ### Custom Exceptions - **`TimestampParsingError`** - Raised when timestamp parsing fails - **`DataLoadingError`** - Raised when data loading operations fail - **`DataSavingError`** - Raised when data saving operations fail ### Utility Functions - **`_parse_timestamp_column()`** - Parse timestamp columns with format detection - **`_filter_by_date_range()`** - Filter DataFrames by date range - **`_normalize_column_names()`** - Convert column names to lowercase ## Architecture Benefits ### Separation of Concerns - Each class has a single, well-defined responsibility - Data loading, saving, and result formatting are cleanly separated - Shared utilities are extracted to prevent code duplication ### Maintainability - All files are under 250 lines (quality gate) - All methods are under 50 lines (quality gate) - Clear interfaces and comprehensive documentation - Type hints for better IDE support and clarity ### Error Handling - Custom exceptions for different error types - Consistent error logging patterns - Graceful degradation (empty DataFrames on load failure) ### Backward Compatibility - Storage class maintains exact same public interface - All existing code continues to work unchanged - Component classes are available for advanced usage ## Migration Notes The refactoring maintains full backward compatibility. Existing code using `Storage` will continue to work unchanged. For new code, consider using the component classes directly for more focused functionality: ```python # Existing pattern (still works) from cycles.utils.storage import Storage storage = Storage(logging=logger) data = storage.load_data('file.csv', start, end) # New pattern for focused usage from cycles.utils.data_loader import DataLoader loader = DataLoader(data_dir, logger) data = loader.load_data('file.csv', start, end) ```