Simon Moisy fa6df78c1e Add initial implementation of the Orderflow Backtest System with OBI and CVD metrics integration, including core modules for storage, strategies, and visualization. Introduced persistent metrics storage in SQLite, optimized memory usage, and enhanced documentation.

2025-08-26 17:22:07 +08:00

8.9 KiB

Raw Blame History

Contributing to Orderflow Backtest System

Development Guidelines

Thank you for your interest in contributing to the Orderflow Backtest System. This document outlines the development process, coding standards, and best practices for maintaining code quality.

Development Environment Setup

Prerequisites

Python: 3.12 or higher
Package Manager: UV (recommended) or pip
Database: SQLite 3.x
GUI: Qt5 for visualization (Linux/macOS)

Installation

# Clone the repository
git clone <repository-url>
cd orderflow_backtest

# Install dependencies
uv sync

# Install development dependencies
uv add --dev pytest coverage mypy

# Verify installation
uv run pytest

Development Tools

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=. --cov-report=html

# Run type checking
uv run mypy .

# Run specific test module
uv run pytest tests/test_storage_metrics.py -v

Code Standards

Function and File Size Limits

Functions: Maximum 50 lines
Files: Maximum 250 lines
Classes: Single responsibility, clear purpose
Methods: One main function per method

Naming Conventions

# Good examples
def calculate_order_book_imbalance(snapshot: BookSnapshot) -> float:
def load_metrics_by_timerange(start: int, end: int) -> List[Metric]:
class MetricCalculator:
class SQLiteMetricsRepository:

# Avoid abbreviations except domain terms
# Good: OBI, CVD (standard financial terms)
# Avoid: calc_obi, proc_data, mgr

Type Annotations

# Required for all public interfaces
def process_trades(trades: List[Trade]) -> Dict[int, float]:
    """Process trades and return volume by timestamp."""
    
class Storage:
    def __init__(self, instrument: str) -> None:
        self.instrument = instrument

Documentation Standards

def calculate_metrics(snapshot: BookSnapshot, trades: List[Trade]) -> Metric:
    """
    Calculate OBI and CVD metrics for a snapshot.
    
    Args:
        snapshot: Orderbook state at specific timestamp
        trades: List of trades executed at this timestamp
        
    Returns:
        Metric: Calculated OBI, CVD, and best bid/ask values
        
    Raises:
        ValueError: If snapshot contains invalid data
        
    Example:
        >>> snapshot = BookSnapshot(...)
        >>> trades = [Trade(...), ...]
        >>> metric = calculate_metrics(snapshot, trades)
        >>> print(f"OBI: {metric.obi:.3f}")
        OBI: 0.333
    """

Architecture Principles

Separation of Concerns

Storage: Data processing and persistence only
Strategy: Trading analysis and signal generation only
Visualizer: Chart rendering and display only
Main: Application orchestration and flow control

Repository Pattern

# Good: Clean interface
class SQLiteMetricsRepository:
    def load_metrics_by_timerange(self, conn: Connection, start: int, end: int) -> List[Metric]:
        # Implementation details hidden
        
# Avoid: Direct SQL in business logic
def analyze_strategy(db_path: Path):
    # Don't do this
    conn = sqlite3.connect(db_path)
    cursor = conn.execute("SELECT * FROM metrics WHERE ...")

Error Handling

# Required pattern
try:
    result = risky_operation()
    return process_result(result)
except SpecificException as e:
    logging.error(f"Operation failed: {e}")
    return default_value
except Exception as e:
    logging.error(f"Unexpected error in operation: {e}")
    raise

Testing Requirements

Test Coverage

Unit Tests: All public methods must have unit tests
Integration Tests: End-to-end workflow testing required
Edge Cases: Handle empty data, boundary conditions, error scenarios

Test Structure

def test_feature_description():
    """Test that feature behaves correctly under normal conditions."""
    # Arrange
    test_data = create_test_data()
    
    # Act
    result = function_under_test(test_data)
    
    # Assert
    assert result.expected_property == expected_value
    assert len(result.collection) == expected_count

Test Data Management

# Use temporary files for database tests
def test_database_operation():
    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp_file:
        db_path = Path(tmp_file.name)
    
    try:
        # Test implementation
        pass
    finally:
        db_path.unlink(missing_ok=True)

Database Development

Schema Changes

Create Migration: Document schema changes in ADR format
Backward Compatibility: Ensure existing databases continue to work
Auto-Migration: Implement automatic schema updates where possible
Performance: Add appropriate indexes for new queries

Query Patterns

# Good: Parameterized queries
cursor.execute(
    "SELECT obi, cvd FROM metrics WHERE timestamp >= ? AND timestamp <= ?",
    (start_timestamp, end_timestamp)
)

# Bad: String formatting (security risk)
query = f"SELECT * FROM metrics WHERE timestamp = {timestamp}"

Performance Guidelines

Batch Operations: Process in batches of 1000 records
Indexes: Add indexes for frequently queried columns
Transactions: Use transactions for multi-record operations
Connection Management: Caller manages connection lifecycle

Performance Requirements

Memory Management

Target: >70% memory reduction vs. full snapshot retention
Measurement: Profile memory usage with large datasets
Optimization: Stream processing, batch operations, minimal object retention

Processing Speed

Target: >500 snapshots/second processing rate
Measurement: Benchmark with realistic datasets
Optimization: Database batching, efficient algorithms, minimal I/O

Storage Efficiency

Target: <25% storage overhead for metrics
Measurement: Compare metrics table size to source data
Optimization: Efficient data types, minimal redundancy

Submission Process

Before Submitting

Run Tests: Ensure all tests pass
```
uv run pytest
```
Check Type Hints: Verify type annotations
```
uv run mypy .
```

Test Coverage: Ensure adequate test coverage

uv run pytest --cov=. --cov-report=term-missing

Documentation: Update relevant documentation files

Pull Request Guidelines

Description: Clear description of changes and motivation
Testing: Include tests for new functionality
Documentation: Update docs for API changes
Breaking Changes: Document any breaking changes
Performance: Include performance impact analysis for significant changes

Code Review Checklist

Follows function/file size limits
Has comprehensive test coverage
Includes proper error handling
Uses type annotations consistently
Maintains backward compatibility
Updates relevant documentation
No security vulnerabilities (SQL injection, etc.)
Performance impact analyzed

Documentation Maintenance

When to Update Documentation

API Changes: Any modification to public interfaces
Architecture Changes: New patterns, data structures, or workflows
Performance Changes: Significant performance improvements or regressions
Feature Additions: New capabilities or metrics

Documentation Types

Code Comments: Complex algorithms and business logic
Docstrings: All public functions and classes
Module Documentation: Purpose and usage examples
Architecture Documentation: System design and component relationships

Getting Help

Resources

Architecture Overview: docs/architecture.md
API Documentation: docs/API.md
Module Documentation: docs/modules/
Decision Records: docs/decisions/

Communication

Issues: Use GitHub issues for bug reports and feature requests
Discussions: Use GitHub discussions for questions and design discussions
Code Review: Comment on pull requests for specific code feedback

Development Workflow

Feature Development

Create Branch: Feature-specific branch from main
Develop: Follow coding standards and test requirements
Test: Comprehensive testing including edge cases
Document: Update relevant documentation
Review: Submit pull request for code review
Merge: Merge after approval and CI success

Bug Fixes

Reproduce: Create test that reproduces the bug
Fix: Implement minimal fix addressing root cause
Verify: Ensure fix resolves issue without regressions
Test: Add regression test to prevent future occurrences

Performance Improvements

Benchmark: Establish baseline performance metrics
Optimize: Implement performance improvements
Measure: Verify performance gains with benchmarks
Document: Update performance characteristics in docs

Thank you for contributing to the Orderflow Backtest System! Your contributions help make this a better tool for cryptocurrency trading analysis.

8.9 KiB Raw Blame History