orderflow_backtest/docs/CONTRIBUTING.md

# Contributing to Orderflow Backtest System

## Development Guidelines

Thank you for your interest in contributing to the Orderflow Backtest System. This document outlines the development process, coding standards, and best practices for maintaining code quality.

## Development Environment Setup

### Prerequisites
- **Python**: 3.12 or higher
- **Package Manager**: UV (recommended) or pip
- **Database**: SQLite 3.x
- **GUI**: Qt5 for visualization (Linux/macOS)

### Installation
```bash
# Clone the repository
git clone <repository-url>
cd orderflow_backtest

# Install dependencies
uv sync

# Install development dependencies
uv add --dev pytest coverage mypy

# Verify installation
uv run pytest
```

### Development Tools
```bash
# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=. --cov-report=html

# Run type checking
uv run mypy .

# Run specific test module
uv run pytest tests/test_storage_metrics.py -v
```

## Code Standards

### Function and File Size Limits
- **Functions**: Maximum 50 lines
- **Files**: Maximum 250 lines
- **Classes**: Single responsibility, clear purpose
- **Methods**: One main function per method

### Naming Conventions
```python
# Good examples
def calculate_order_book_imbalance(snapshot: BookSnapshot) -> float:
def load_metrics_by_timerange(start: int, end: int) -> List[Metric]:
class MetricCalculator:
class SQLiteMetricsRepository:

# Avoid abbreviations except domain terms
# Good: OBI, CVD (standard financial terms)
# Avoid: calc_obi, proc_data, mgr
```

### Type Annotations
```python
# Required for all public interfaces
def process_trades(trades: List[Trade]) -> Dict[int, float]:
    """Process trades and return volume by timestamp."""

class Storage:
    def __init__(self, instrument: str) -> None:
        self.instrument = instrument
```

### Documentation Standards
```python
def calculate_metrics(snapshot: BookSnapshot, trades: List[Trade]) -> Metric:
    """
    Calculate OBI and CVD metrics for a snapshot.

    Args:
        snapshot: Orderbook state at specific timestamp
        trades: List of trades executed at this timestamp

    Returns:
        Metric: Calculated OBI, CVD, and best bid/ask values

    Raises:
        ValueError: If snapshot contains invalid data

    Example:
        >>> snapshot = BookSnapshot(...)
        >>> trades = [Trade(...), ...]
        >>> metric = calculate_metrics(snapshot, trades)
        >>> print(f"OBI: {metric.obi:.3f}")
        OBI: 0.333
    """
```

## Architecture Principles

### Separation of Concerns
- **Storage**: Data processing and persistence only
- **Strategy**: Trading analysis and signal generation only
- **Visualizer**: Chart rendering and display only
- **Main**: Application orchestration and flow control

### Repository Pattern
```python
# Good: Clean interface
class SQLiteMetricsRepository:
    def load_metrics_by_timerange(self, conn: Connection, start: int, end: int) -> List[Metric]:
        # Implementation details hidden

# Avoid: Direct SQL in business logic
def analyze_strategy(db_path: Path):
    # Don't do this
    conn = sqlite3.connect(db_path)
    cursor = conn.execute("SELECT * FROM metrics WHERE ...")
```

### Error Handling
```python
# Required pattern
try:
    result = risky_operation()
    return process_result(result)
except SpecificException as e:
    logging.error(f"Operation failed: {e}")
    return default_value
except Exception as e:
    logging.error(f"Unexpected error in operation: {e}")
    raise
```

## Testing Requirements

### Test Coverage
- **Unit Tests**: All public methods must have unit tests
- **Integration Tests**: End-to-end workflow testing required
- **Edge Cases**: Handle empty data, boundary conditions, error scenarios

### Test Structure
```python
def test_feature_description():
    """Test that feature behaves correctly under normal conditions."""
    # Arrange
    test_data = create_test_data()

    # Act
    result = function_under_test(test_data)

    # Assert
    assert result.expected_property == expected_value
    assert len(result.collection) == expected_count
```

### Test Data Management
```python
# Use temporary files for database tests
def test_database_operation():
    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp_file:
        db_path = Path(tmp_file.name)

    try:
        # Test implementation
        pass
    finally:
        db_path.unlink(missing_ok=True)
```

## Database Development

### Schema Changes
1. **Create Migration**: Document schema changes in ADR format
2. **Backward Compatibility**: Ensure existing databases continue to work
3. **Auto-Migration**: Implement automatic schema updates where possible
4. **Performance**: Add appropriate indexes for new queries

### Query Patterns
```python
# Good: Parameterized queries
cursor.execute(
    "SELECT obi, cvd FROM metrics WHERE timestamp >= ? AND timestamp <= ?",
    (start_timestamp, end_timestamp)
)

# Bad: String formatting (security risk)
query = f"SELECT * FROM metrics WHERE timestamp = {timestamp}"
```

### Performance Guidelines
- **Batch Operations**: Process in batches of 1000 records
- **Indexes**: Add indexes for frequently queried columns
- **Transactions**: Use transactions for multi-record operations
- **Connection Management**: Caller manages connection lifecycle

## Performance Requirements

### Memory Management
- **Target**: >70% memory reduction vs. full snapshot retention
- **Measurement**: Profile memory usage with large datasets
- **Optimization**: Stream processing, batch operations, minimal object retention

### Processing Speed
- **Target**: >500 snapshots/second processing rate
- **Measurement**: Benchmark with realistic datasets
- **Optimization**: Database batching, efficient algorithms, minimal I/O

### Storage Efficiency
- **Target**: <25% storage overhead for metrics
- **Measurement**: Compare metrics table size to source data
- **Optimization**: Efficient data types, minimal redundancy

## Submission Process

### Before Submitting
1. **Run Tests**: Ensure all tests pass
   ```bash
   uv run pytest
   ```

2. **Check Type Hints**: Verify type annotations
   ```bash
   uv run mypy .
   ```

3. **Test Coverage**: Ensure adequate test coverage
   ```bash
   uv run pytest --cov=. --cov-report=term-missing
   ```

4. **Documentation**: Update relevant documentation files

### Pull Request Guidelines
- **Description**: Clear description of changes and motivation
- **Testing**: Include tests for new functionality
- **Documentation**: Update docs for API changes
- **Breaking Changes**: Document any breaking changes
- **Performance**: Include performance impact analysis for significant changes

### Code Review Checklist
- [ ] Follows function/file size limits
- [ ] Has comprehensive test coverage
- [ ] Includes proper error handling
- [ ] Uses type annotations consistently
- [ ] Maintains backward compatibility
- [ ] Updates relevant documentation
- [ ] No security vulnerabilities (SQL injection, etc.)
- [ ] Performance impact analyzed

## Documentation Maintenance

### When to Update Documentation
- **API Changes**: Any modification to public interfaces
- **Architecture Changes**: New patterns, data structures, or workflows
- **Performance Changes**: Significant performance improvements or regressions
- **Feature Additions**: New capabilities or metrics

### Documentation Types
- **Code Comments**: Complex algorithms and business logic
- **Docstrings**: All public functions and classes
- **Module Documentation**: Purpose and usage examples
- **Architecture Documentation**: System design and component relationships

## Getting Help

### Resources
- **Architecture Overview**: `docs/architecture.md`
- **API Documentation**: `docs/API.md`
- **Module Documentation**: `docs/modules/`
- **Decision Records**: `docs/decisions/`

### Communication
- **Issues**: Use GitHub issues for bug reports and feature requests
- **Discussions**: Use GitHub discussions for questions and design discussions
- **Code Review**: Comment on pull requests for specific code feedback

---

## Development Workflow

### Feature Development
1. **Create Branch**: Feature-specific branch from main
2. **Develop**: Follow coding standards and test requirements
3. **Test**: Comprehensive testing including edge cases
4. **Document**: Update relevant documentation
5. **Review**: Submit pull request for code review
6. **Merge**: Merge after approval and CI success

### Bug Fixes
1. **Reproduce**: Create test that reproduces the bug
2. **Fix**: Implement minimal fix addressing root cause
3. **Verify**: Ensure fix resolves issue without regressions
4. **Test**: Add regression test to prevent future occurrences

### Performance Improvements
1. **Benchmark**: Establish baseline performance metrics
2. **Optimize**: Implement performance improvements
3. **Measure**: Verify performance gains with benchmarks
4. **Document**: Update performance characteristics in docs

Thank you for contributing to the Orderflow Backtest System! Your contributions help make this a better tool for cryptocurrency trading analysis.