169 lines
5.1 KiB
Markdown

# Module: main
## Purpose
The `main` module provides the command-line interface (CLI) orchestration for the orderflow backtest system. It handles database discovery, process management, and coordinates the streaming pipeline with the visualization frontend using Typer for argument parsing.
## Public Interface
### Functions
- `main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None`: Primary CLI entrypoint
- `discover_databases(instrument: str, start_date: str, end_date: str) -> list[Path]`: Find matching database files
- `launch_visualizer() -> subprocess.Popen | None`: Start Dash application in separate process
### CLI Arguments
- `instrument`: Trading pair identifier (e.g., "BTC-USDT")
- `start_date`: Start date in YYYY-MM-DD format (UTC)
- `end_date`: End date in YYYY-MM-DD format (UTC)
- `--window-seconds`: OHLC aggregation window size (default: 60)
## Usage Examples
### Command Line Usage
```bash
# Basic usage with default 60-second windows
uv run python main.py BTC-USDT 2025-01-01 2025-01-31
# Custom window size
uv run python main.py ETH-USDT 2025-02-01 2025-02-28 --window-seconds 30
# Single day processing
uv run python main.py SOL-USDT 2025-03-15 2025-03-15
```
### Programmatic Usage
```python
from main import main, discover_databases
# Run processing pipeline
main("BTC-USDT", "2025-01-01", "2025-01-31", window_seconds=120)
# Discover available databases
db_files = discover_databases("ETH-USDT", "2025-02-01", "2025-02-28")
print(f"Found {len(db_files)} database files")
```
## Dependencies
### Internal
- `db_interpreter.DBInterpreter`: Database streaming
- `ohlc_processor.OHLCProcessor`: Trade aggregation and orderbook processing
- `viz_io`: Data clearing functions
### External
- `typer`: CLI framework and argument parsing
- `subprocess`: Process management for visualization
- `pathlib`: File and directory operations
- `datetime`: Date parsing and validation
- `logging`: Operational logging
- `sys`: Exit code management
## Database Discovery Logic
### File Pattern Matching
```python
# Expected directory structure
../data/OKX/{instrument}/{date}/
# Example paths
../data/OKX/BTC-USDT/2025-01-01/trades.db
../data/OKX/ETH-USDT/2025-02-15/trades.db
```
### Discovery Algorithm
1. Parse start and end dates to datetime objects
2. Iterate through date range (inclusive)
3. Construct expected path for each date
4. Verify file existence and readability
5. Return sorted list of valid database paths
## Process Orchestration
### Visualization Process Management
```python
# Launch Dash app in separate process
viz_process = subprocess.Popen([
"uv", "run", "python", "app.py"
], cwd=project_root)
# Process management
try:
# Main processing loop
process_databases(db_files)
finally:
# Cleanup visualization process
if viz_process:
viz_process.terminate()
viz_process.wait(timeout=5)
```
### Data Processing Pipeline
1. **Initialize**: Clear existing data files
2. **Launch**: Start visualization process
3. **Stream**: Process each database sequentially
4. **Aggregate**: Generate OHLC bars and depth snapshots
5. **Cleanup**: Terminate visualization and finalize
## Error Handling
### Database Access Errors
- **File not found**: Log warning and skip missing databases
- **Permission denied**: Log error and exit with status code 1
- **Corruption**: Log error for specific database and continue with next
### Process Management Errors
- **Visualization startup failure**: Log error but continue processing
- **Process termination**: Graceful shutdown with timeout
- **Resource cleanup**: Ensure child processes are terminated
### Date Validation
- **Invalid format**: Clear error message with expected format
- **Invalid range**: End date must be >= start date
- **Future dates**: Warning for dates beyond data availability
## Performance Characteristics
- **Sequential processing**: Databases processed one at a time
- **Memory efficient**: Streaming approach prevents loading entire datasets
- **Process isolation**: Visualization runs independently
- **Resource cleanup**: Automatic process termination on exit
## Testing
Run module tests:
```bash
uv run pytest test_main.py -v
```
Test coverage includes:
- Database discovery logic
- Date parsing and validation
- Process management
- Error handling scenarios
- CLI argument validation
## Configuration
### Default Settings
- **Data directory**: `../data/OKX` (relative to project root)
- **Visualization command**: `uv run python app.py`
- **Window size**: 60 seconds
- **Process timeout**: 5 seconds for termination
### Environment Variables
- **DATA_PATH**: Override default data directory
- **VISUALIZATION_PORT**: Override Dash port (requires app.py modification)
## Known Issues
- Assumes specific directory structure under `../data/OKX`
- No validation of database schema compatibility
- Limited error recovery for process management
- No progress indication for large datasets
## Development Notes
- Uses Typer for modern CLI interface
- Subprocess management compatible with Unix/Windows
- Logging configured for both development and production use
- Exit codes follow Unix conventions (0=success, 1=error)