Cycles/docs/utils_storage.md

7.5 KiB

Storage Utilities

This document describes the refactored storage utilities found in cycles/utils/ that provide modular, maintainable data and results management.

Overview

The storage utilities have been refactored into a modular architecture with clear separation of concerns:

  • Storage - Main coordinator class providing unified interface (backward compatible)
  • DataLoader - Handles loading data from various file formats
  • DataSaver - Manages saving data with proper format handling
  • ResultFormatter - Formats and writes backtest results to CSV files
  • storage_utils - Shared utilities and custom exceptions

This design improves maintainability, testability, and follows the single responsibility principle.

Constants

  • RESULTS_DIR: Default directory for storing results (default: "../results")
  • DATA_DIR: Default directory for storing input data (default: "../data")

Main Classes

Storage (Coordinator Class)

The main interface that coordinates all storage operations while maintaining backward compatibility.

__init__(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)

Description: Initializes the Storage coordinator with component instances.

Parameters:

  • logging (optional): A logging instance for outputting information
  • results_dir (str, optional): Path to the directory for storing results
  • data_dir (str, optional): Path to the directory for storing data

Creates: Component instances for DataLoader, DataSaver, and ResultFormatter

load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame

Description: Loads data with optimized dtypes and filtering, supporting CSV and JSON input.

Parameters:

  • file_path (str): Path to the data file (relative to data_dir)
  • start_date (datetime-like): The start date for filtering data
  • stop_date (datetime-like): The end date for filtering data

Returns: pandas.DataFrame with timestamp index

Raises: DataLoadingError if loading fails

save_data(self, data: pd.DataFrame, file_path: str) -> None

Description: Saves processed data to a CSV file with proper timestamp handling.

Parameters:

  • data (pd.DataFrame): The DataFrame to save
  • file_path (str): Path to the data file (relative to data_dir)

Raises: DataSavingError if saving fails

format_row(self, row: Dict[str, Any]) -> Dict[str, str]

Description: Formats a dictionary row for output to results CSV files.

Parameters:

  • row (dict): The row of data to format

Returns: dict with formatted values (percentages, currency, etc.)

write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None

Description: Writes a chunk of results to a CSV file with optional header.

Parameters:

  • filename (str): The name of the file to write to
  • fieldnames (list): CSV header/column names
  • rows (list): List of dictionaries representing rows
  • write_header (bool, optional): Whether to write the header
  • initial_usd (float, optional): Initial USD value for header comment

write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str

Description: Writes combined backtest results to a CSV file with metadata.

Parameters:

  • filename (str): Name of the file to write to (relative to results_dir)
  • fieldnames (list): CSV header/column names
  • rows (list): List of result dictionaries
  • metadata_lines (list, optional): Header comment lines

Returns: Full path to the written file

write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None

Description: Writes trade data to separate CSV files grouped by timeframe and stop-loss.

Parameters:

  • all_trade_rows (list): List of trade dictionaries
  • trades_fieldnames (list): CSV header for trade files

Files Created: trades_{timeframe}_ST{sl_percent}pct.csv in results_dir

DataLoader

Handles loading and preprocessing of data from various file formats.

Key Features:

  • Supports CSV and JSON formats
  • Optimized pandas dtypes for financial data
  • Intelligent timestamp parsing (Unix timestamps and datetime strings)
  • Date range filtering
  • Column name normalization (lowercase)
  • Comprehensive error handling

Methods:

  • load_data() - Main loading interface
  • _load_json_data() - JSON-specific loading logic
  • _load_csv_data() - CSV-specific loading logic
  • _process_csv_timestamps() - Timestamp parsing for CSV data

DataSaver

Manages saving data with proper format handling and index conversion.

Key Features:

  • Converts DatetimeIndex to Unix timestamps for CSV compatibility
  • Handles numeric indexes appropriately
  • Ensures 'timestamp' column is first in output
  • Comprehensive error handling and logging

Methods:

  • save_data() - Main saving interface
  • _prepare_data_for_saving() - Data preparation logic
  • _convert_datetime_index_to_timestamp() - DatetimeIndex conversion
  • _convert_numeric_index_to_timestamp() - Numeric index conversion

ResultFormatter

Handles formatting and writing of backtest results to CSV files.

Key Features:

  • Consistent formatting for percentages and currency
  • Grouped trade file writing by timeframe/stop-loss
  • Metadata header support
  • Tab-delimited output for results
  • Error handling for all write operations

Methods:

  • format_row() - Format individual result rows
  • write_results_chunk() - Write result chunks with headers
  • write_backtest_results() - Write combined results with metadata
  • write_trades() - Write grouped trade files

Utility Functions and Exceptions

Custom Exceptions

  • TimestampParsingError - Raised when timestamp parsing fails
  • DataLoadingError - Raised when data loading operations fail
  • DataSavingError - Raised when data saving operations fail

Utility Functions

  • _parse_timestamp_column() - Parse timestamp columns with format detection
  • _filter_by_date_range() - Filter DataFrames by date range
  • _normalize_column_names() - Convert column names to lowercase

Architecture Benefits

Separation of Concerns

  • Each class has a single, well-defined responsibility
  • Data loading, saving, and result formatting are cleanly separated
  • Shared utilities are extracted to prevent code duplication

Maintainability

  • All files are under 250 lines (quality gate)
  • All methods are under 50 lines (quality gate)
  • Clear interfaces and comprehensive documentation
  • Type hints for better IDE support and clarity

Error Handling

  • Custom exceptions for different error types
  • Consistent error logging patterns
  • Graceful degradation (empty DataFrames on load failure)

Backward Compatibility

  • Storage class maintains exact same public interface
  • All existing code continues to work unchanged
  • Component classes are available for advanced usage

Migration Notes

The refactoring maintains full backward compatibility. Existing code using Storage will continue to work unchanged. For new code, consider using the component classes directly for more focused functionality:

# Existing pattern (still works)
from cycles.utils.storage import Storage
storage = Storage(logging=logger)
data = storage.load_data('file.csv', start, end)

# New pattern for focused usage
from cycles.utils.data_loader import DataLoader
loader = DataLoader(data_dir, logger)
data = loader.load_data('file.csv', start, end)