orderflow_backtest/docs/CONTRIBUTING.md

307 lines
8.9 KiB
Markdown

# Contributing to Orderflow Backtest System
## Development Guidelines
Thank you for your interest in contributing to the Orderflow Backtest System. This document outlines the development process, coding standards, and best practices for maintaining code quality.
## Development Environment Setup
### Prerequisites
- **Python**: 3.12 or higher
- **Package Manager**: UV (recommended) or pip
- **Database**: SQLite 3.x
- **GUI**: Qt5 for visualization (Linux/macOS)
### Installation
```bash
# Clone the repository
git clone <repository-url>
cd orderflow_backtest
# Install dependencies
uv sync
# Install development dependencies
uv add --dev pytest coverage mypy
# Verify installation
uv run pytest
```
### Development Tools
```bash
# Run tests
uv run pytest
# Run tests with coverage
uv run pytest --cov=. --cov-report=html
# Run type checking
uv run mypy .
# Run specific test module
uv run pytest tests/test_storage_metrics.py -v
```
## Code Standards
### Function and File Size Limits
- **Functions**: Maximum 50 lines
- **Files**: Maximum 250 lines
- **Classes**: Single responsibility, clear purpose
- **Methods**: One main function per method
### Naming Conventions
```python
# Good examples
def calculate_order_book_imbalance(snapshot: BookSnapshot) -> float:
def load_metrics_by_timerange(start: int, end: int) -> List[Metric]:
class MetricCalculator:
class SQLiteMetricsRepository:
# Avoid abbreviations except domain terms
# Good: OBI, CVD (standard financial terms)
# Avoid: calc_obi, proc_data, mgr
```
### Type Annotations
```python
# Required for all public interfaces
def process_trades(trades: List[Trade]) -> Dict[int, float]:
"""Process trades and return volume by timestamp."""
class Storage:
def __init__(self, instrument: str) -> None:
self.instrument = instrument
```
### Documentation Standards
```python
def calculate_metrics(snapshot: BookSnapshot, trades: List[Trade]) -> Metric:
"""
Calculate OBI and CVD metrics for a snapshot.
Args:
snapshot: Orderbook state at specific timestamp
trades: List of trades executed at this timestamp
Returns:
Metric: Calculated OBI, CVD, and best bid/ask values
Raises:
ValueError: If snapshot contains invalid data
Example:
>>> snapshot = BookSnapshot(...)
>>> trades = [Trade(...), ...]
>>> metric = calculate_metrics(snapshot, trades)
>>> print(f"OBI: {metric.obi:.3f}")
OBI: 0.333
"""
```
## Architecture Principles
### Separation of Concerns
- **Storage**: Data processing and persistence only
- **Strategy**: Trading analysis and signal generation only
- **Visualizer**: Chart rendering and display only
- **Main**: Application orchestration and flow control
### Repository Pattern
```python
# Good: Clean interface
class SQLiteMetricsRepository:
def load_metrics_by_timerange(self, conn: Connection, start: int, end: int) -> List[Metric]:
# Implementation details hidden
# Avoid: Direct SQL in business logic
def analyze_strategy(db_path: Path):
# Don't do this
conn = sqlite3.connect(db_path)
cursor = conn.execute("SELECT * FROM metrics WHERE ...")
```
### Error Handling
```python
# Required pattern
try:
result = risky_operation()
return process_result(result)
except SpecificException as e:
logging.error(f"Operation failed: {e}")
return default_value
except Exception as e:
logging.error(f"Unexpected error in operation: {e}")
raise
```
## Testing Requirements
### Test Coverage
- **Unit Tests**: All public methods must have unit tests
- **Integration Tests**: End-to-end workflow testing required
- **Edge Cases**: Handle empty data, boundary conditions, error scenarios
### Test Structure
```python
def test_feature_description():
"""Test that feature behaves correctly under normal conditions."""
# Arrange
test_data = create_test_data()
# Act
result = function_under_test(test_data)
# Assert
assert result.expected_property == expected_value
assert len(result.collection) == expected_count
```
### Test Data Management
```python
# Use temporary files for database tests
def test_database_operation():
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp_file:
db_path = Path(tmp_file.name)
try:
# Test implementation
pass
finally:
db_path.unlink(missing_ok=True)
```
## Database Development
### Schema Changes
1. **Create Migration**: Document schema changes in ADR format
2. **Backward Compatibility**: Ensure existing databases continue to work
3. **Auto-Migration**: Implement automatic schema updates where possible
4. **Performance**: Add appropriate indexes for new queries
### Query Patterns
```python
# Good: Parameterized queries
cursor.execute(
"SELECT obi, cvd FROM metrics WHERE timestamp >= ? AND timestamp <= ?",
(start_timestamp, end_timestamp)
)
# Bad: String formatting (security risk)
query = f"SELECT * FROM metrics WHERE timestamp = {timestamp}"
```
### Performance Guidelines
- **Batch Operations**: Process in batches of 1000 records
- **Indexes**: Add indexes for frequently queried columns
- **Transactions**: Use transactions for multi-record operations
- **Connection Management**: Caller manages connection lifecycle
## Performance Requirements
### Memory Management
- **Target**: >70% memory reduction vs. full snapshot retention
- **Measurement**: Profile memory usage with large datasets
- **Optimization**: Stream processing, batch operations, minimal object retention
### Processing Speed
- **Target**: >500 snapshots/second processing rate
- **Measurement**: Benchmark with realistic datasets
- **Optimization**: Database batching, efficient algorithms, minimal I/O
### Storage Efficiency
- **Target**: <25% storage overhead for metrics
- **Measurement**: Compare metrics table size to source data
- **Optimization**: Efficient data types, minimal redundancy
## Submission Process
### Before Submitting
1. **Run Tests**: Ensure all tests pass
```bash
uv run pytest
```
2. **Check Type Hints**: Verify type annotations
```bash
uv run mypy .
```
3. **Test Coverage**: Ensure adequate test coverage
```bash
uv run pytest --cov=. --cov-report=term-missing
```
4. **Documentation**: Update relevant documentation files
### Pull Request Guidelines
- **Description**: Clear description of changes and motivation
- **Testing**: Include tests for new functionality
- **Documentation**: Update docs for API changes
- **Breaking Changes**: Document any breaking changes
- **Performance**: Include performance impact analysis for significant changes
### Code Review Checklist
- [ ] Follows function/file size limits
- [ ] Has comprehensive test coverage
- [ ] Includes proper error handling
- [ ] Uses type annotations consistently
- [ ] Maintains backward compatibility
- [ ] Updates relevant documentation
- [ ] No security vulnerabilities (SQL injection, etc.)
- [ ] Performance impact analyzed
## Documentation Maintenance
### When to Update Documentation
- **API Changes**: Any modification to public interfaces
- **Architecture Changes**: New patterns, data structures, or workflows
- **Performance Changes**: Significant performance improvements or regressions
- **Feature Additions**: New capabilities or metrics
### Documentation Types
- **Code Comments**: Complex algorithms and business logic
- **Docstrings**: All public functions and classes
- **Module Documentation**: Purpose and usage examples
- **Architecture Documentation**: System design and component relationships
## Getting Help
### Resources
- **Architecture Overview**: `docs/architecture.md`
- **API Documentation**: `docs/API.md`
- **Module Documentation**: `docs/modules/`
- **Decision Records**: `docs/decisions/`
### Communication
- **Issues**: Use GitHub issues for bug reports and feature requests
- **Discussions**: Use GitHub discussions for questions and design discussions
- **Code Review**: Comment on pull requests for specific code feedback
---
## Development Workflow
### Feature Development
1. **Create Branch**: Feature-specific branch from main
2. **Develop**: Follow coding standards and test requirements
3. **Test**: Comprehensive testing including edge cases
4. **Document**: Update relevant documentation
5. **Review**: Submit pull request for code review
6. **Merge**: Merge after approval and CI success
### Bug Fixes
1. **Reproduce**: Create test that reproduces the bug
2. **Fix**: Implement minimal fix addressing root cause
3. **Verify**: Ensure fix resolves issue without regressions
4. **Test**: Add regression test to prevent future occurrences
### Performance Improvements
1. **Benchmark**: Establish baseline performance metrics
2. **Optimize**: Implement performance improvements
3. **Measure**: Verify performance gains with benchmarks
4. **Document**: Update performance characteristics in docs
Thank you for contributing to the Orderflow Backtest System! Your contributions help make this a better tool for cryptocurrency trading analysis.