Simon Moisy 6c5dcc1183 Implement backtesting framework with modular architecture for data loading, processing, and result management. Introduced BacktestRunner, ConfigManager, and ResultProcessor classes for improved maintainability and error handling. Updated main execution script to utilize new components and added comprehensive logging. Enhanced README with detailed project overview and usage instructions.

2025-06-25 13:08:07 +08:00

7.5 KiB

Raw Blame History

Storage Utilities

This document describes the refactored storage utilities found in cycles/utils/ that provide modular, maintainable data and results management.

Overview

The storage utilities have been refactored into a modular architecture with clear separation of concerns:

Storage - Main coordinator class providing unified interface (backward compatible)
DataLoader - Handles loading data from various file formats
DataSaver - Manages saving data with proper format handling
ResultFormatter - Formats and writes backtest results to CSV files
storage_utils - Shared utilities and custom exceptions

This design improves maintainability, testability, and follows the single responsibility principle.

Constants

RESULTS_DIR: Default directory for storing results (default: "../results")
DATA_DIR: Default directory for storing input data (default: "../data")

Main Classes

`Storage` (Coordinator Class)

The main interface that coordinates all storage operations while maintaining backward compatibility.

`init(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)`

Description: Initializes the Storage coordinator with component instances.

Parameters:

logging (optional): A logging instance for outputting information
results_dir (str, optional): Path to the directory for storing results
data_dir (str, optional): Path to the directory for storing data

Creates: Component instances for DataLoader, DataSaver, and ResultFormatter

`load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame`

Description: Loads data with optimized dtypes and filtering, supporting CSV and JSON input.

Parameters:

file_path (str): Path to the data file (relative to data_dir)
start_date (datetime-like): The start date for filtering data
stop_date (datetime-like): The end date for filtering data

Returns: pandas.DataFrame with timestamp index

Raises: DataLoadingError if loading fails

`save_data(self, data: pd.DataFrame, file_path: str) -> None`

Description: Saves processed data to a CSV file with proper timestamp handling.

Parameters:

data (pd.DataFrame): The DataFrame to save
file_path (str): Path to the data file (relative to data_dir)

Raises: DataSavingError if saving fails

`format_row(self, row: Dict[str, Any]) -> Dict[str, str]`

Description: Formats a dictionary row for output to results CSV files.

Parameters:

row (dict): The row of data to format

Returns: dict with formatted values (percentages, currency, etc.)

`write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None`

Description: Writes a chunk of results to a CSV file with optional header.

Parameters:

filename (str): The name of the file to write to
fieldnames (list): CSV header/column names
rows (list): List of dictionaries representing rows
write_header (bool, optional): Whether to write the header
initial_usd (float, optional): Initial USD value for header comment

`write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str`

Description: Writes combined backtest results to a CSV file with metadata.

Parameters:

filename (str): Name of the file to write to (relative to results_dir)
fieldnames (list): CSV header/column names
rows (list): List of result dictionaries
metadata_lines (list, optional): Header comment lines

Returns: Full path to the written file

`write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None`

Description: Writes trade data to separate CSV files grouped by timeframe and stop-loss.

Parameters:

all_trade_rows (list): List of trade dictionaries
trades_fieldnames (list): CSV header for trade files

Files Created: trades_{timeframe}_ST{sl_percent}pct.csv in results_dir

`DataLoader`

Handles loading and preprocessing of data from various file formats.

Key Features:

Supports CSV and JSON formats
Optimized pandas dtypes for financial data
Intelligent timestamp parsing (Unix timestamps and datetime strings)
Date range filtering
Column name normalization (lowercase)
Comprehensive error handling

Methods:

load_data() - Main loading interface
_load_json_data() - JSON-specific loading logic
_load_csv_data() - CSV-specific loading logic
_process_csv_timestamps() - Timestamp parsing for CSV data

`DataSaver`

Manages saving data with proper format handling and index conversion.

Key Features:

Converts DatetimeIndex to Unix timestamps for CSV compatibility
Handles numeric indexes appropriately
Ensures 'timestamp' column is first in output
Comprehensive error handling and logging

Methods:

save_data() - Main saving interface
_prepare_data_for_saving() - Data preparation logic
_convert_datetime_index_to_timestamp() - DatetimeIndex conversion
_convert_numeric_index_to_timestamp() - Numeric index conversion

`ResultFormatter`

Handles formatting and writing of backtest results to CSV files.

Key Features:

Consistent formatting for percentages and currency
Grouped trade file writing by timeframe/stop-loss
Metadata header support
Tab-delimited output for results
Error handling for all write operations

Methods:

format_row() - Format individual result rows
write_results_chunk() - Write result chunks with headers
write_backtest_results() - Write combined results with metadata
write_trades() - Write grouped trade files

Utility Functions and Exceptions

Custom Exceptions

TimestampParsingError - Raised when timestamp parsing fails
DataLoadingError - Raised when data loading operations fail
DataSavingError - Raised when data saving operations fail

Utility Functions

_parse_timestamp_column() - Parse timestamp columns with format detection
_filter_by_date_range() - Filter DataFrames by date range
_normalize_column_names() - Convert column names to lowercase

Architecture Benefits

Separation of Concerns

Each class has a single, well-defined responsibility
Data loading, saving, and result formatting are cleanly separated
Shared utilities are extracted to prevent code duplication

Maintainability

All files are under 250 lines (quality gate)
All methods are under 50 lines (quality gate)
Clear interfaces and comprehensive documentation
Type hints for better IDE support and clarity

Error Handling

Custom exceptions for different error types
Consistent error logging patterns
Graceful degradation (empty DataFrames on load failure)

Backward Compatibility

Storage class maintains exact same public interface
All existing code continues to work unchanged
Component classes are available for advanced usage

Migration Notes

The refactoring maintains full backward compatibility. Existing code using Storage will continue to work unchanged. For new code, consider using the component classes directly for more focused functionality:

# Existing pattern (still works)
from cycles.utils.storage import Storage
storage = Storage(logging=logger)
data = storage.load_data('file.csv', start, end)

# New pattern for focused usage
from cycles.utils.data_loader import DataLoader
loader = DataLoader(data_dir, logger)
data = loader.load_data('file.csv', start, end)

7.5 KiB Raw Blame History

Storage Utilities

Overview

Constants

Main Classes

Storage (Coordinator Class)

__init__(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)

load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame

save_data(self, data: pd.DataFrame, file_path: str) -> None

format_row(self, row: Dict[str, Any]) -> Dict[str, str]

write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None

write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str

write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None

DataLoader

Key Features:

Methods:

DataSaver

Key Features:

Methods:

ResultFormatter

Key Features:

Methods:

Utility Functions and Exceptions

Custom Exceptions

Utility Functions

Architecture Benefits

Separation of Concerns

Maintainability

Error Handling

Backward Compatibility

Migration Notes

7.5 KiB

Raw Blame History

`Storage` (Coordinator Class)

`init(self, logging=None, results_dir=RESULTS_DIR, data_dir=DATA_DIR)`

`load_data(self, file_path: str, start_date: Union[str, pd.Timestamp], stop_date: Union[str, pd.Timestamp]) -> pd.DataFrame`

`save_data(self, data: pd.DataFrame, file_path: str) -> None`

`format_row(self, row: Dict[str, Any]) -> Dict[str, str]`

`write_results_chunk(self, filename: str, fieldnames: List[str], rows: List[Dict], write_header: bool = False, initial_usd: Optional[float] = None) -> None`

`write_backtest_results(self, filename: str, fieldnames: List[str], rows: List[Dict], metadata_lines: Optional[List[str]] = None) -> str`

`write_trades(self, all_trade_rows: List[Dict], trades_fieldnames: List[str]) -> None`

`DataLoader`

`DataSaver`

`ResultFormatter`