7.5 KiB
Storage Utilities
This document describes the refactored storage utilities found in cycles/utils/ that provide modular, maintainable data and results management.
Overview
The storage utilities have been refactored into a modular architecture with clear separation of concerns:
Storage- Main coordinator class providing unified interface (backward compatible)DataLoader- Handles loading data from various file formatsDataSaver- Manages saving data with proper format handlingResultFormatter- Formats and writes backtest results to CSV filesstorage_utils- Shared utilities and custom exceptions
This design improves maintainability, testability, and follows the single responsibility principle.
Constants
RESULTS_DIR: Default directory for storing results (default: "../results")DATA_DIR: Default directory for storing input data (default: "../data")
Main Classes
Storage (Coordinator Class)
The main interface that coordinates all storage operations while maintaining backward compatibility.
__init__(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)
Description: Initializes the Storage coordinator with component instances.
Parameters:
logging(optional): A logging instance for outputting informationresults_dir(str, optional): Path to the directory for storing resultsdata_dir(str, optional): Path to the directory for storing data
Creates: Component instances for DataLoader, DataSaver, and ResultFormatter
load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame
Description: Loads data with optimized dtypes and filtering, supporting CSV and JSON input.
Parameters:
file_path(str): Path to the data file (relative todata_dir)start_date(datetime-like): The start date for filtering datastop_date(datetime-like): The end date for filtering data
Returns: pandas.DataFrame with timestamp index
Raises: DataLoadingError if loading fails
save_data(self, data: pd.DataFrame, file_path: str) -> None
Description: Saves processed data to a CSV file with proper timestamp handling.
Parameters:
data(pd.DataFrame): The DataFrame to savefile_path(str): Path to the data file (relative todata_dir)
Raises: DataSavingError if saving fails
format_row(self, row: Dict[str, Any]) -> Dict[str, str]
Description: Formats a dictionary row for output to results CSV files.
Parameters:
row(dict): The row of data to format
Returns: dict with formatted values (percentages, currency, etc.)
write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None
Description: Writes a chunk of results to a CSV file with optional header.
Parameters:
filename(str): The name of the file to write tofieldnames(list): CSV header/column namesrows(list): List of dictionaries representing rowswrite_header(bool, optional): Whether to write the headerinitial_usd(float, optional): Initial USD value for header comment
write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str
Description: Writes combined backtest results to a CSV file with metadata.
Parameters:
filename(str): Name of the file to write to (relative toresults_dir)fieldnames(list): CSV header/column namesrows(list): List of result dictionariesmetadata_lines(list, optional): Header comment lines
Returns: Full path to the written file
write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None
Description: Writes trade data to separate CSV files grouped by timeframe and stop-loss.
Parameters:
all_trade_rows(list): List of trade dictionariestrades_fieldnames(list): CSV header for trade files
Files Created: trades_{timeframe}_ST{sl_percent}pct.csv in results_dir
DataLoader
Handles loading and preprocessing of data from various file formats.
Key Features:
- Supports CSV and JSON formats
- Optimized pandas dtypes for financial data
- Intelligent timestamp parsing (Unix timestamps and datetime strings)
- Date range filtering
- Column name normalization (lowercase)
- Comprehensive error handling
Methods:
load_data()- Main loading interface_load_json_data()- JSON-specific loading logic_load_csv_data()- CSV-specific loading logic_process_csv_timestamps()- Timestamp parsing for CSV data
DataSaver
Manages saving data with proper format handling and index conversion.
Key Features:
- Converts DatetimeIndex to Unix timestamps for CSV compatibility
- Handles numeric indexes appropriately
- Ensures 'timestamp' column is first in output
- Comprehensive error handling and logging
Methods:
save_data()- Main saving interface_prepare_data_for_saving()- Data preparation logic_convert_datetime_index_to_timestamp()- DatetimeIndex conversion_convert_numeric_index_to_timestamp()- Numeric index conversion
ResultFormatter
Handles formatting and writing of backtest results to CSV files.
Key Features:
- Consistent formatting for percentages and currency
- Grouped trade file writing by timeframe/stop-loss
- Metadata header support
- Tab-delimited output for results
- Error handling for all write operations
Methods:
format_row()- Format individual result rowswrite_results_chunk()- Write result chunks with headerswrite_backtest_results()- Write combined results with metadatawrite_trades()- Write grouped trade files
Utility Functions and Exceptions
Custom Exceptions
TimestampParsingError- Raised when timestamp parsing failsDataLoadingError- Raised when data loading operations failDataSavingError- Raised when data saving operations fail
Utility Functions
_parse_timestamp_column()- Parse timestamp columns with format detection_filter_by_date_range()- Filter DataFrames by date range_normalize_column_names()- Convert column names to lowercase
Architecture Benefits
Separation of Concerns
- Each class has a single, well-defined responsibility
- Data loading, saving, and result formatting are cleanly separated
- Shared utilities are extracted to prevent code duplication
Maintainability
- All files are under 250 lines (quality gate)
- All methods are under 50 lines (quality gate)
- Clear interfaces and comprehensive documentation
- Type hints for better IDE support and clarity
Error Handling
- Custom exceptions for different error types
- Consistent error logging patterns
- Graceful degradation (empty DataFrames on load failure)
Backward Compatibility
- Storage class maintains exact same public interface
- All existing code continues to work unchanged
- Component classes are available for advanced usage
Migration Notes
The refactoring maintains full backward compatibility. Existing code using Storage will continue to work unchanged. For new code, consider using the component classes directly for more focused functionality:
# Existing pattern (still works)
from cycles.utils.storage import Storage
storage = Storage(logging=logger)
data = storage.load_data('file.csv', start, end)
# New pattern for focused usage
from cycles.utils.data_loader import DataLoader
loader = DataLoader(data_dir, logger)
data = loader.load_data('file.csv', start, end)