orderflow_backtest/docs/CONTRIBUTING.md

8.9 KiB

Contributing to Orderflow Backtest System

Development Guidelines

Thank you for your interest in contributing to the Orderflow Backtest System. This document outlines the development process, coding standards, and best practices for maintaining code quality.

Development Environment Setup

Prerequisites

  • Python: 3.12 or higher
  • Package Manager: UV (recommended) or pip
  • Database: SQLite 3.x
  • GUI: Qt5 for visualization (Linux/macOS)

Installation

# Clone the repository
git clone <repository-url>
cd orderflow_backtest

# Install dependencies
uv sync

# Install development dependencies
uv add --dev pytest coverage mypy

# Verify installation
uv run pytest

Development Tools

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=. --cov-report=html

# Run type checking
uv run mypy .

# Run specific test module
uv run pytest tests/test_storage_metrics.py -v

Code Standards

Function and File Size Limits

  • Functions: Maximum 50 lines
  • Files: Maximum 250 lines
  • Classes: Single responsibility, clear purpose
  • Methods: One main function per method

Naming Conventions

# Good examples
def calculate_order_book_imbalance(snapshot: BookSnapshot) -> float:
def load_metrics_by_timerange(start: int, end: int) -> List[Metric]:
class MetricCalculator:
class SQLiteMetricsRepository:

# Avoid abbreviations except domain terms
# Good: OBI, CVD (standard financial terms)
# Avoid: calc_obi, proc_data, mgr

Type Annotations

# Required for all public interfaces
def process_trades(trades: List[Trade]) -> Dict[int, float]:
    """Process trades and return volume by timestamp."""
    
class Storage:
    def __init__(self, instrument: str) -> None:
        self.instrument = instrument

Documentation Standards

def calculate_metrics(snapshot: BookSnapshot, trades: List[Trade]) -> Metric:
    """
    Calculate OBI and CVD metrics for a snapshot.
    
    Args:
        snapshot: Orderbook state at specific timestamp
        trades: List of trades executed at this timestamp
        
    Returns:
        Metric: Calculated OBI, CVD, and best bid/ask values
        
    Raises:
        ValueError: If snapshot contains invalid data
        
    Example:
        >>> snapshot = BookSnapshot(...)
        >>> trades = [Trade(...), ...]
        >>> metric = calculate_metrics(snapshot, trades)
        >>> print(f"OBI: {metric.obi:.3f}")
        OBI: 0.333
    """

Architecture Principles

Separation of Concerns

  • Storage: Data processing and persistence only
  • Strategy: Trading analysis and signal generation only
  • Visualizer: Chart rendering and display only
  • Main: Application orchestration and flow control

Repository Pattern

# Good: Clean interface
class SQLiteMetricsRepository:
    def load_metrics_by_timerange(self, conn: Connection, start: int, end: int) -> List[Metric]:
        # Implementation details hidden
        
# Avoid: Direct SQL in business logic
def analyze_strategy(db_path: Path):
    # Don't do this
    conn = sqlite3.connect(db_path)
    cursor = conn.execute("SELECT * FROM metrics WHERE ...")

Error Handling

# Required pattern
try:
    result = risky_operation()
    return process_result(result)
except SpecificException as e:
    logging.error(f"Operation failed: {e}")
    return default_value
except Exception as e:
    logging.error(f"Unexpected error in operation: {e}")
    raise

Testing Requirements

Test Coverage

  • Unit Tests: All public methods must have unit tests
  • Integration Tests: End-to-end workflow testing required
  • Edge Cases: Handle empty data, boundary conditions, error scenarios

Test Structure

def test_feature_description():
    """Test that feature behaves correctly under normal conditions."""
    # Arrange
    test_data = create_test_data()
    
    # Act
    result = function_under_test(test_data)
    
    # Assert
    assert result.expected_property == expected_value
    assert len(result.collection) == expected_count

Test Data Management

# Use temporary files for database tests
def test_database_operation():
    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp_file:
        db_path = Path(tmp_file.name)
    
    try:
        # Test implementation
        pass
    finally:
        db_path.unlink(missing_ok=True)

Database Development

Schema Changes

  1. Create Migration: Document schema changes in ADR format
  2. Backward Compatibility: Ensure existing databases continue to work
  3. Auto-Migration: Implement automatic schema updates where possible
  4. Performance: Add appropriate indexes for new queries

Query Patterns

# Good: Parameterized queries
cursor.execute(
    "SELECT obi, cvd FROM metrics WHERE timestamp >= ? AND timestamp <= ?",
    (start_timestamp, end_timestamp)
)

# Bad: String formatting (security risk)
query = f"SELECT * FROM metrics WHERE timestamp = {timestamp}"

Performance Guidelines

  • Batch Operations: Process in batches of 1000 records
  • Indexes: Add indexes for frequently queried columns
  • Transactions: Use transactions for multi-record operations
  • Connection Management: Caller manages connection lifecycle

Performance Requirements

Memory Management

  • Target: >70% memory reduction vs. full snapshot retention
  • Measurement: Profile memory usage with large datasets
  • Optimization: Stream processing, batch operations, minimal object retention

Processing Speed

  • Target: >500 snapshots/second processing rate
  • Measurement: Benchmark with realistic datasets
  • Optimization: Database batching, efficient algorithms, minimal I/O

Storage Efficiency

  • Target: <25% storage overhead for metrics
  • Measurement: Compare metrics table size to source data
  • Optimization: Efficient data types, minimal redundancy

Submission Process

Before Submitting

  1. Run Tests: Ensure all tests pass

    uv run pytest
    
  2. Check Type Hints: Verify type annotations

    uv run mypy .
    
  3. Test Coverage: Ensure adequate test coverage

    uv run pytest --cov=. --cov-report=term-missing
    
  4. Documentation: Update relevant documentation files

Pull Request Guidelines

  • Description: Clear description of changes and motivation
  • Testing: Include tests for new functionality
  • Documentation: Update docs for API changes
  • Breaking Changes: Document any breaking changes
  • Performance: Include performance impact analysis for significant changes

Code Review Checklist

  • Follows function/file size limits
  • Has comprehensive test coverage
  • Includes proper error handling
  • Uses type annotations consistently
  • Maintains backward compatibility
  • Updates relevant documentation
  • No security vulnerabilities (SQL injection, etc.)
  • Performance impact analyzed

Documentation Maintenance

When to Update Documentation

  • API Changes: Any modification to public interfaces
  • Architecture Changes: New patterns, data structures, or workflows
  • Performance Changes: Significant performance improvements or regressions
  • Feature Additions: New capabilities or metrics

Documentation Types

  • Code Comments: Complex algorithms and business logic
  • Docstrings: All public functions and classes
  • Module Documentation: Purpose and usage examples
  • Architecture Documentation: System design and component relationships

Getting Help

Resources

  • Architecture Overview: docs/architecture.md
  • API Documentation: docs/API.md
  • Module Documentation: docs/modules/
  • Decision Records: docs/decisions/

Communication

  • Issues: Use GitHub issues for bug reports and feature requests
  • Discussions: Use GitHub discussions for questions and design discussions
  • Code Review: Comment on pull requests for specific code feedback

Development Workflow

Feature Development

  1. Create Branch: Feature-specific branch from main
  2. Develop: Follow coding standards and test requirements
  3. Test: Comprehensive testing including edge cases
  4. Document: Update relevant documentation
  5. Review: Submit pull request for code review
  6. Merge: Merge after approval and CI success

Bug Fixes

  1. Reproduce: Create test that reproduces the bug
  2. Fix: Implement minimal fix addressing root cause
  3. Verify: Ensure fix resolves issue without regressions
  4. Test: Add regression test to prevent future occurrences

Performance Improvements

  1. Benchmark: Establish baseline performance metrics
  2. Optimize: Implement performance improvements
  3. Measure: Verify performance gains with benchmarks
  4. Document: Update performance characteristics in docs

Thank you for contributing to the Orderflow Backtest System! Your contributions help make this a better tool for cryptocurrency trading analysis.