# Contributing to Orderflow Backtest System ## Development Guidelines Thank you for your interest in contributing to the Orderflow Backtest System. This document outlines the development process, coding standards, and best practices for maintaining code quality. ## Development Environment Setup ### Prerequisites - **Python**: 3.12 or higher - **Package Manager**: UV (recommended) or pip - **Database**: SQLite 3.x - **GUI**: Qt5 for visualization (Linux/macOS) ### Installation ```bash # Clone the repository git clone cd orderflow_backtest # Install dependencies uv sync # Install development dependencies uv add --dev pytest coverage mypy # Verify installation uv run pytest ``` ### Development Tools ```bash # Run tests uv run pytest # Run tests with coverage uv run pytest --cov=. --cov-report=html # Run type checking uv run mypy . # Run specific test module uv run pytest tests/test_storage_metrics.py -v ``` ## Code Standards ### Function and File Size Limits - **Functions**: Maximum 50 lines - **Files**: Maximum 250 lines - **Classes**: Single responsibility, clear purpose - **Methods**: One main function per method ### Naming Conventions ```python # Good examples def calculate_order_book_imbalance(snapshot: BookSnapshot) -> float: def load_metrics_by_timerange(start: int, end: int) -> List[Metric]: class MetricCalculator: class SQLiteMetricsRepository: # Avoid abbreviations except domain terms # Good: OBI, CVD (standard financial terms) # Avoid: calc_obi, proc_data, mgr ``` ### Type Annotations ```python # Required for all public interfaces def process_trades(trades: List[Trade]) -> Dict[int, float]: """Process trades and return volume by timestamp.""" class Storage: def __init__(self, instrument: str) -> None: self.instrument = instrument ``` ### Documentation Standards ```python def calculate_metrics(snapshot: BookSnapshot, trades: List[Trade]) -> Metric: """ Calculate OBI and CVD metrics for a snapshot. Args: snapshot: Orderbook state at specific timestamp trades: List of trades executed at this timestamp Returns: Metric: Calculated OBI, CVD, and best bid/ask values Raises: ValueError: If snapshot contains invalid data Example: >>> snapshot = BookSnapshot(...) >>> trades = [Trade(...), ...] >>> metric = calculate_metrics(snapshot, trades) >>> print(f"OBI: {metric.obi:.3f}") OBI: 0.333 """ ``` ## Architecture Principles ### Separation of Concerns - **Storage**: Data processing and persistence only - **Strategy**: Trading analysis and signal generation only - **Visualizer**: Chart rendering and display only - **Main**: Application orchestration and flow control ### Repository Pattern ```python # Good: Clean interface class SQLiteMetricsRepository: def load_metrics_by_timerange(self, conn: Connection, start: int, end: int) -> List[Metric]: # Implementation details hidden # Avoid: Direct SQL in business logic def analyze_strategy(db_path: Path): # Don't do this conn = sqlite3.connect(db_path) cursor = conn.execute("SELECT * FROM metrics WHERE ...") ``` ### Error Handling ```python # Required pattern try: result = risky_operation() return process_result(result) except SpecificException as e: logging.error(f"Operation failed: {e}") return default_value except Exception as e: logging.error(f"Unexpected error in operation: {e}") raise ``` ## Testing Requirements ### Test Coverage - **Unit Tests**: All public methods must have unit tests - **Integration Tests**: End-to-end workflow testing required - **Edge Cases**: Handle empty data, boundary conditions, error scenarios ### Test Structure ```python def test_feature_description(): """Test that feature behaves correctly under normal conditions.""" # Arrange test_data = create_test_data() # Act result = function_under_test(test_data) # Assert assert result.expected_property == expected_value assert len(result.collection) == expected_count ``` ### Test Data Management ```python # Use temporary files for database tests def test_database_operation(): with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp_file: db_path = Path(tmp_file.name) try: # Test implementation pass finally: db_path.unlink(missing_ok=True) ``` ## Database Development ### Schema Changes 1. **Create Migration**: Document schema changes in ADR format 2. **Backward Compatibility**: Ensure existing databases continue to work 3. **Auto-Migration**: Implement automatic schema updates where possible 4. **Performance**: Add appropriate indexes for new queries ### Query Patterns ```python # Good: Parameterized queries cursor.execute( "SELECT obi, cvd FROM metrics WHERE timestamp >= ? AND timestamp <= ?", (start_timestamp, end_timestamp) ) # Bad: String formatting (security risk) query = f"SELECT * FROM metrics WHERE timestamp = {timestamp}" ``` ### Performance Guidelines - **Batch Operations**: Process in batches of 1000 records - **Indexes**: Add indexes for frequently queried columns - **Transactions**: Use transactions for multi-record operations - **Connection Management**: Caller manages connection lifecycle ## Performance Requirements ### Memory Management - **Target**: >70% memory reduction vs. full snapshot retention - **Measurement**: Profile memory usage with large datasets - **Optimization**: Stream processing, batch operations, minimal object retention ### Processing Speed - **Target**: >500 snapshots/second processing rate - **Measurement**: Benchmark with realistic datasets - **Optimization**: Database batching, efficient algorithms, minimal I/O ### Storage Efficiency - **Target**: <25% storage overhead for metrics - **Measurement**: Compare metrics table size to source data - **Optimization**: Efficient data types, minimal redundancy ## Submission Process ### Before Submitting 1. **Run Tests**: Ensure all tests pass ```bash uv run pytest ``` 2. **Check Type Hints**: Verify type annotations ```bash uv run mypy . ``` 3. **Test Coverage**: Ensure adequate test coverage ```bash uv run pytest --cov=. --cov-report=term-missing ``` 4. **Documentation**: Update relevant documentation files ### Pull Request Guidelines - **Description**: Clear description of changes and motivation - **Testing**: Include tests for new functionality - **Documentation**: Update docs for API changes - **Breaking Changes**: Document any breaking changes - **Performance**: Include performance impact analysis for significant changes ### Code Review Checklist - [ ] Follows function/file size limits - [ ] Has comprehensive test coverage - [ ] Includes proper error handling - [ ] Uses type annotations consistently - [ ] Maintains backward compatibility - [ ] Updates relevant documentation - [ ] No security vulnerabilities (SQL injection, etc.) - [ ] Performance impact analyzed ## Documentation Maintenance ### When to Update Documentation - **API Changes**: Any modification to public interfaces - **Architecture Changes**: New patterns, data structures, or workflows - **Performance Changes**: Significant performance improvements or regressions - **Feature Additions**: New capabilities or metrics ### Documentation Types - **Code Comments**: Complex algorithms and business logic - **Docstrings**: All public functions and classes - **Module Documentation**: Purpose and usage examples - **Architecture Documentation**: System design and component relationships ## Getting Help ### Resources - **Architecture Overview**: `docs/architecture.md` - **API Documentation**: `docs/API.md` - **Module Documentation**: `docs/modules/` - **Decision Records**: `docs/decisions/` ### Communication - **Issues**: Use GitHub issues for bug reports and feature requests - **Discussions**: Use GitHub discussions for questions and design discussions - **Code Review**: Comment on pull requests for specific code feedback --- ## Development Workflow ### Feature Development 1. **Create Branch**: Feature-specific branch from main 2. **Develop**: Follow coding standards and test requirements 3. **Test**: Comprehensive testing including edge cases 4. **Document**: Update relevant documentation 5. **Review**: Submit pull request for code review 6. **Merge**: Merge after approval and CI success ### Bug Fixes 1. **Reproduce**: Create test that reproduces the bug 2. **Fix**: Implement minimal fix addressing root cause 3. **Verify**: Ensure fix resolves issue without regressions 4. **Test**: Add regression test to prevent future occurrences ### Performance Improvements 1. **Benchmark**: Establish baseline performance metrics 2. **Optimize**: Implement performance improvements 3. **Measure**: Verify performance gains with benchmarks 4. **Document**: Update performance characteristics in docs Thank you for contributing to the Orderflow Backtest System! Your contributions help make this a better tool for cryptocurrency trading analysis.