Cycles/docs/utils_storage.md

# Storage Utilities

This document describes the refactored storage utilities found in `cycles/utils/` that provide modular, maintainable data and results management.

## Overview

The storage utilities have been refactored into a modular architecture with clear separation of concerns:

- **`Storage`** - Main coordinator class providing unified interface (backward compatible)
- **`DataLoader`** - Handles loading data from various file formats
- **`DataSaver`** - Manages saving data with proper format handling
- **`ResultFormatter`** - Formats and writes backtest results to CSV files
- **`storage_utils`** - Shared utilities and custom exceptions

This design improves maintainability, testability, and follows the single responsibility principle.

## Constants

-   `RESULTS_DIR`: Default directory for storing results (default: "../results")
-   `DATA_DIR`: Default directory for storing input data (default: "../data")

## Main Classes

### `Storage` (Coordinator Class)

The main interface that coordinates all storage operations while maintaining backward compatibility.

#### `__init__(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)`

**Description**: Initializes the Storage coordinator with component instances.

**Parameters**:
- `logging` (optional): A logging instance for outputting information
- `results_dir` (str, optional): Path to the directory for storing results
- `data_dir` (str, optional): Path to the directory for storing data

**Creates**: Component instances for DataLoader, DataSaver, and ResultFormatter

#### `load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame`

**Description**: Loads data with optimized dtypes and filtering, supporting CSV and JSON input.

**Parameters**:
- `file_path` (str): Path to the data file (relative to `data_dir`)
- `start_date` (datetime-like): The start date for filtering data
- `stop_date` (datetime-like): The end date for filtering data

**Returns**: `pandas.DataFrame` with timestamp index

**Raises**: `DataLoadingError` if loading fails

#### `save_data(self, data: pd.DataFrame, file_path: str) -> None`

**Description**: Saves processed data to a CSV file with proper timestamp handling.

**Parameters**:
- `data` (pd.DataFrame): The DataFrame to save
- `file_path` (str): Path to the data file (relative to `data_dir`)

**Raises**: `DataSavingError` if saving fails

#### `format_row(self, row: Dict[str, Any]) -> Dict[str, str]`

**Description**: Formats a dictionary row for output to results CSV files.

**Parameters**:
- `row` (dict): The row of data to format

**Returns**: `dict` with formatted values (percentages, currency, etc.)

#### `write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None`

**Description**: Writes a chunk of results to a CSV file with optional header.

**Parameters**:
- `filename` (str): The name of the file to write to
- `fieldnames` (list): CSV header/column names
- `rows` (list): List of dictionaries representing rows
- `write_header` (bool, optional): Whether to write the header
- `initial_usd` (float, optional): Initial USD value for header comment

#### `write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str`

**Description**: Writes combined backtest results to a CSV file with metadata.

**Parameters**:
- `filename` (str): Name of the file to write to (relative to `results_dir`)
- `fieldnames` (list): CSV header/column names
- `rows` (list): List of result dictionaries
- `metadata_lines` (list, optional): Header comment lines

**Returns**: Full path to the written file

#### `write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None`

**Description**: Writes trade data to separate CSV files grouped by timeframe and stop-loss.

**Parameters**:
- `all_trade_rows` (list): List of trade dictionaries
- `trades_fieldnames` (list): CSV header for trade files

**Files Created**: `trades_{timeframe}_ST{sl_percent}pct.csv` in `results_dir`

### `DataLoader`

Handles loading and preprocessing of data from various file formats.

#### Key Features:
- Supports CSV and JSON formats
- Optimized pandas dtypes for financial data
- Intelligent timestamp parsing (Unix timestamps and datetime strings)
- Date range filtering
- Column name normalization (lowercase)
- Comprehensive error handling

#### Methods:
- `load_data()` - Main loading interface
- `_load_json_data()` - JSON-specific loading logic
- `_load_csv_data()` - CSV-specific loading logic
- `_process_csv_timestamps()` - Timestamp parsing for CSV data

### `DataSaver`

Manages saving data with proper format handling and index conversion.

#### Key Features:
- Converts DatetimeIndex to Unix timestamps for CSV compatibility
- Handles numeric indexes appropriately
- Ensures 'timestamp' column is first in output
- Comprehensive error handling and logging

#### Methods:
- `save_data()` - Main saving interface
- `_prepare_data_for_saving()` - Data preparation logic
- `_convert_datetime_index_to_timestamp()` - DatetimeIndex conversion
- `_convert_numeric_index_to_timestamp()` - Numeric index conversion

### `ResultFormatter`

Handles formatting and writing of backtest results to CSV files.

#### Key Features:
- Consistent formatting for percentages and currency
- Grouped trade file writing by timeframe/stop-loss
- Metadata header support
- Tab-delimited output for results
- Error handling for all write operations

#### Methods:
- `format_row()` - Format individual result rows
- `write_results_chunk()` - Write result chunks with headers
- `write_backtest_results()` - Write combined results with metadata
- `write_trades()` - Write grouped trade files

## Utility Functions and Exceptions

### Custom Exceptions

- **`TimestampParsingError`** - Raised when timestamp parsing fails
- **`DataLoadingError`** - Raised when data loading operations fail
- **`DataSavingError`** - Raised when data saving operations fail

### Utility Functions

- **`_parse_timestamp_column()`** - Parse timestamp columns with format detection
- **`_filter_by_date_range()`** - Filter DataFrames by date range
- **`_normalize_column_names()`** - Convert column names to lowercase

## Architecture Benefits

### Separation of Concerns
- Each class has a single, well-defined responsibility
- Data loading, saving, and result formatting are cleanly separated
- Shared utilities are extracted to prevent code duplication

### Maintainability
- All files are under 250 lines (quality gate)
- All methods are under 50 lines (quality gate)
- Clear interfaces and comprehensive documentation
- Type hints for better IDE support and clarity

### Error Handling
- Custom exceptions for different error types
- Consistent error logging patterns
- Graceful degradation (empty DataFrames on load failure)

### Backward Compatibility
- Storage class maintains exact same public interface
- All existing code continues to work unchanged
- Component classes are available for advanced usage

## Migration Notes

The refactoring maintains full backward compatibility. Existing code using `Storage` will continue to work unchanged. For new code, consider using the component classes directly for more focused functionality:

```python
# Existing pattern (still works)
from cycles.utils.storage import Storage
storage = Storage(logging=logger)
data = storage.load_data('file.csv', start, end)

# New pattern for focused usage
from cycles.utils.data_loader import DataLoader
loader = DataLoader(data_dir, logger)
data = loader.load_data('file.csv', start, end)
```