diff --git a/.cursor/rules/always-global.mdc b/.cursor/rules/always-global.mdc deleted file mode 100644 index 1997270..0000000 --- a/.cursor/rules/always-global.mdc +++ /dev/null @@ -1,61 +0,0 @@ ---- -description: Global development standards and AI interaction principles -globs: -alwaysApply: true ---- - -# Rule: Always Apply - Global Development Standards - -## AI Interaction Principles - -### Step-by-Step Development -- **NEVER** generate large blocks of code without explanation -- **ALWAYS** ask "provide your plan in a concise bullet list and wait for my confirmation before proceeding" -- Break complex tasks into smaller, manageable pieces (≤250 lines per file, ≤50 lines per function) -- Explain your reasoning step-by-step before writing code -- Wait for explicit approval before moving to the next sub-task - -### Context Awareness -- **ALWAYS** reference existing code patterns and data structures before suggesting new approaches -- Ask about existing conventions before implementing new functionality -- Preserve established architectural decisions unless explicitly asked to change them -- Maintain consistency with existing naming conventions and code style - -## Code Quality Standards - -### File and Function Limits -- **Maximum file size**: 250 lines -- **Maximum function size**: 50 lines -- **Maximum complexity**: If a function does more than one main thing, break it down -- **Naming**: Use clear, descriptive names that explain purpose - -### Documentation Requirements -- **Every public function** must have a docstring explaining purpose, parameters, and return value -- **Every class** must have a class-level docstring -- **Complex logic** must have inline comments explaining the "why", not just the "what" -- **API endpoints** must be documented with request/response examples - -### Error Handling -- **ALWAYS** include proper error handling for external dependencies -- **NEVER** use bare except clauses -- Provide meaningful error messages that help with debugging -- Log errors appropriately for the application context - -## Security and Best Practices -- **NEVER** hardcode credentials, API keys, or sensitive data -- **ALWAYS** validate user inputs -- Use parameterized queries for database operations -- Follow the principle of least privilege -- Implement proper authentication and authorization - -## Testing Requirements -- **Every implementation** should have corresponding unit tests -- **Every API endpoint** should have integration tests -- Test files should be placed alongside the code they test -- Use descriptive test names that explain what is being tested - -## Response Format -- Be concise and avoid unnecessary repetition -- Focus on actionable information -- Provide examples when explaining complex concepts -- Ask clarifying questions when requirements are ambiguous \ No newline at end of file diff --git a/.cursor/rules/architecture.mdc b/.cursor/rules/architecture.mdc deleted file mode 100644 index 9fbc494..0000000 --- a/.cursor/rules/architecture.mdc +++ /dev/null @@ -1,237 +0,0 @@ ---- -description: Modular design principles and architecture guidelines for scalable development -globs: -alwaysApply: false ---- - -# Rule: Architecture and Modular Design - -## Goal -Maintain a clean, modular architecture that scales effectively and prevents the complexity issues that arise in AI-assisted development. - -## Core Architecture Principles - -### 1. Modular Design -- **Single Responsibility**: Each module has one clear purpose -- **Loose Coupling**: Modules depend on interfaces, not implementations -- **High Cohesion**: Related functionality is grouped together -- **Clear Boundaries**: Module interfaces are well-defined and stable - -### 2. Size Constraints -- **Files**: Maximum 250 lines per file -- **Functions**: Maximum 50 lines per function -- **Classes**: Maximum 300 lines per class -- **Modules**: Maximum 10 public functions/classes per module - -### 3. Dependency Management -- **Layer Dependencies**: Higher layers depend on lower layers only -- **No Circular Dependencies**: Modules cannot depend on each other cyclically -- **Interface Segregation**: Depend on specific interfaces, not broad ones -- **Dependency Injection**: Pass dependencies rather than creating them internally - -## Modular Architecture Patterns - -### Layer Structure -``` -src/ -├── presentation/ # UI, API endpoints, CLI interfaces -├── application/ # Business logic, use cases, workflows -├── domain/ # Core business entities and rules -├── infrastructure/ # Database, external APIs, file systems -└── shared/ # Common utilities, constants, types -``` - -### Module Organization -``` -module_name/ -├── __init__.py # Public interface exports -├── core.py # Main module logic -├── types.py # Type definitions and interfaces -├── utils.py # Module-specific utilities -├── tests/ # Module tests -└── README.md # Module documentation -``` - -## Design Patterns for AI Development - -### 1. Repository Pattern -Separate data access from business logic: - -```python -# Domain interface -class UserRepository: - def get_by_id(self, user_id: str) -> User: ... - def save(self, user: User) -> None: ... - -# Infrastructure implementation -class SqlUserRepository(UserRepository): - def get_by_id(self, user_id: str) -> User: - # Database-specific implementation - pass -``` - -### 2. Service Pattern -Encapsulate business logic in focused services: - -```python -class UserService: - def __init__(self, user_repo: UserRepository): - self._user_repo = user_repo - - def create_user(self, data: UserData) -> User: - # Validation and business logic - # Single responsibility: user creation - pass -``` - -### 3. Factory Pattern -Create complex objects with clear interfaces: - -```python -class DatabaseFactory: - @staticmethod - def create_connection(config: DatabaseConfig) -> Connection: - # Handle different database types - # Encapsulate connection complexity - pass -``` - -## Architecture Decision Guidelines - -### When to Create New Modules -Create a new module when: -- **Functionality** exceeds size constraints (250 lines) -- **Responsibility** is distinct from existing modules -- **Dependencies** would create circular references -- **Reusability** would benefit other parts of the system -- **Testing** requires isolated test environments - -### When to Split Existing Modules -Split modules when: -- **File size** exceeds 250 lines -- **Multiple responsibilities** are evident -- **Testing** becomes difficult due to complexity -- **Dependencies** become too numerous -- **Change frequency** differs significantly between parts - -### Module Interface Design -```python -# Good: Clear, focused interface -class PaymentProcessor: - def process_payment(self, amount: Money, method: PaymentMethod) -> PaymentResult: - """Process a single payment transaction.""" - pass - -# Bad: Unfocused, kitchen-sink interface -class PaymentManager: - def process_payment(self, ...): pass - def validate_card(self, ...): pass - def send_receipt(self, ...): pass - def update_inventory(self, ...): pass # Wrong responsibility! -``` - -## Architecture Validation - -### Architecture Review Checklist -- [ ] **Dependencies flow in one direction** (no cycles) -- [ ] **Layers are respected** (presentation doesn't call infrastructure directly) -- [ ] **Modules have single responsibility** -- [ ] **Interfaces are stable** and well-defined -- [ ] **Size constraints** are maintained -- [ ] **Testing** is straightforward for each module - -### Red Flags -- **God Objects**: Classes/modules that do too many things -- **Circular Dependencies**: Modules that depend on each other -- **Deep Inheritance**: More than 3 levels of inheritance -- **Large Interfaces**: Interfaces with more than 7 methods -- **Tight Coupling**: Modules that know too much about each other's internals - -## Refactoring Guidelines - -### When to Refactor -- Module exceeds size constraints -- Code duplication across modules -- Difficult to test individual components -- New features require changing multiple unrelated modules -- Performance bottlenecks due to poor separation - -### Refactoring Process -1. **Identify** the specific architectural problem -2. **Design** the target architecture -3. **Create tests** to verify current behavior -4. **Implement changes** incrementally -5. **Validate** that tests still pass -6. **Update documentation** to reflect changes - -### Safe Refactoring Practices -- **One change at a time**: Don't mix refactoring with new features -- **Tests first**: Ensure comprehensive test coverage before refactoring -- **Incremental changes**: Small steps with verification at each stage -- **Backward compatibility**: Maintain existing interfaces during transition -- **Documentation updates**: Keep architecture documentation current - -## Architecture Documentation - -### Architecture Decision Records (ADRs) -Document significant decisions in `./docs/decisions/`: - -```markdown -# ADR-003: Service Layer Architecture - -## Status -Accepted - -## Context -As the application grows, business logic is scattered across controllers and models. - -## Decision -Implement a service layer to encapsulate business logic. - -## Consequences -**Positive:** -- Clear separation of concerns -- Easier testing of business logic -- Better reusability across different interfaces - -**Negative:** -- Additional abstraction layer -- More files to maintain -``` - -### Module Documentation Template -```markdown -# Module: [Name] - -## Purpose -What this module does and why it exists. - -## Dependencies -- **Imports from**: List of modules this depends on -- **Used by**: List of modules that depend on this one -- **External**: Third-party dependencies - -## Public Interface -```python -# Key functions and classes exposed by this module -``` - -## Architecture Notes -- Design patterns used -- Important architectural decisions -- Known limitations or constraints -``` - -## Migration Strategies - -### Legacy Code Integration -- **Strangler Fig Pattern**: Gradually replace old code with new modules -- **Adapter Pattern**: Create interfaces to integrate old and new code -- **Facade Pattern**: Simplify complex legacy interfaces - -### Gradual Modernization -1. **Identify boundaries** in existing code -2. **Extract modules** one at a time -3. **Create interfaces** for each extracted module -4. **Test thoroughly** at each step -5. **Update documentation** continuously \ No newline at end of file diff --git a/.cursor/rules/code-review.mdc b/.cursor/rules/code-review.mdc deleted file mode 100644 index 8b0808c..0000000 --- a/.cursor/rules/code-review.mdc +++ /dev/null @@ -1,123 +0,0 @@ ---- -description: AI-generated code review checklist and quality assurance guidelines -globs: -alwaysApply: false ---- - -# Rule: Code Review and Quality Assurance - -## Goal -Establish systematic review processes for AI-generated code to maintain quality, security, and maintainability standards. - -## AI Code Review Checklist - -### Pre-Implementation Review -Before accepting any AI-generated code: - -1. **Understand the Code** - - [ ] Can you explain what the code does in your own words? - - [ ] Do you understand each function and its purpose? - - [ ] Are there any "magic" values or unexplained logic? - - [ ] Does the code solve the actual problem stated? - -2. **Architecture Alignment** - - [ ] Does the code follow established project patterns? - - [ ] Is it consistent with existing data structures? - - [ ] Does it integrate cleanly with existing components? - - [ ] Are new dependencies justified and necessary? - -3. **Code Quality** - - [ ] Are functions smaller than 50 lines? - - [ ] Are files smaller than 250 lines? - - [ ] Are variable and function names descriptive? - - [ ] Is the code DRY (Don't Repeat Yourself)? - -### Security Review -- [ ] **Input Validation**: All user inputs are validated and sanitized -- [ ] **Authentication**: Proper authentication checks are in place -- [ ] **Authorization**: Access controls are implemented correctly -- [ ] **Data Protection**: Sensitive data is handled securely -- [ ] **SQL Injection**: Database queries use parameterized statements -- [ ] **XSS Prevention**: Output is properly escaped -- [ ] **Error Handling**: Errors don't leak sensitive information - -### Integration Review -- [ ] **Existing Functionality**: New code doesn't break existing features -- [ ] **Data Consistency**: Database changes maintain referential integrity -- [ ] **API Compatibility**: Changes don't break existing API contracts -- [ ] **Performance Impact**: New code doesn't introduce performance bottlenecks -- [ ] **Testing Coverage**: Appropriate tests are included - -## Review Process - -### Step 1: Initial Code Analysis -1. **Read through the entire generated code** before running it -2. **Identify patterns** that don't match existing codebase -3. **Check dependencies** - are new packages really needed? -4. **Verify logic flow** - does the algorithm make sense? - -### Step 2: Security and Error Handling Review -1. **Trace data flow** from input to output -2. **Identify potential failure points** and verify error handling -3. **Check for security vulnerabilities** using the security checklist -4. **Verify proper logging** and monitoring implementation - -### Step 3: Integration Testing -1. **Test with existing code** to ensure compatibility -2. **Run existing test suite** to verify no regressions -3. **Test edge cases** and error conditions -4. **Verify performance** under realistic conditions - -## Common AI Code Issues to Watch For - -### Overcomplication Patterns -- **Unnecessary abstractions**: AI creating complex patterns for simple tasks -- **Over-engineering**: Solutions that are more complex than needed -- **Redundant code**: AI recreating existing functionality -- **Inappropriate design patterns**: Using patterns that don't fit the use case - -### Context Loss Indicators -- **Inconsistent naming**: Different conventions from existing code -- **Wrong data structures**: Using different patterns than established -- **Ignored existing functions**: Reimplementing existing functionality -- **Architectural misalignment**: Code that doesn't fit the overall design - -### Technical Debt Indicators -- **Magic numbers**: Hardcoded values without explanation -- **Poor error messages**: Generic or unhelpful error handling -- **Missing documentation**: Code without adequate comments -- **Tight coupling**: Components that are too interdependent - -## Quality Gates - -### Mandatory Reviews -All AI-generated code must pass these gates before acceptance: - -1. **Security Review**: No security vulnerabilities detected -2. **Integration Review**: Integrates cleanly with existing code -3. **Performance Review**: Meets performance requirements -4. **Maintainability Review**: Code can be easily modified by team members -5. **Documentation Review**: Adequate documentation is provided - -### Acceptance Criteria -- [ ] Code is understandable by any team member -- [ ] Integration requires minimal changes to existing code -- [ ] Security review passes all checks -- [ ] Performance meets established benchmarks -- [ ] Documentation is complete and accurate - -## Rejection Criteria -Reject AI-generated code if: -- Security vulnerabilities are present -- Code is too complex for the problem being solved -- Integration requires major refactoring of existing code -- Code duplicates existing functionality without justification -- Documentation is missing or inadequate - -## Review Documentation -For each review, document: -- Issues found and how they were resolved -- Performance impact assessment -- Security concerns and mitigations -- Integration challenges and solutions -- Recommendations for future similar tasks \ No newline at end of file diff --git a/.cursor/rules/context-management.mdc b/.cursor/rules/context-management.mdc deleted file mode 100644 index 399658a..0000000 --- a/.cursor/rules/context-management.mdc +++ /dev/null @@ -1,93 +0,0 @@ ---- -description: Context management for maintaining codebase awareness and preventing context drift -globs: -alwaysApply: false ---- - -# Rule: Context Management - -## Goal -Maintain comprehensive project context to prevent context drift and ensure AI-generated code integrates seamlessly with existing codebase patterns and architecture. - -## Context Documentation Requirements - -### PRD.md file documentation -1. **Project Overview** - - Business objectives and goals - - Target users and use cases - - Key success metrics - -### CONTEXT.md File Structure -Every project must maintain a `CONTEXT.md` file in the root directory with: - -1. **Architecture Overview** - - High-level system architecture - - Key design patterns used - - Database schema overview - - API structure and conventions - -2. **Technology Stack** - - Programming languages and versions - - Frameworks and libraries - - Database systems - - Development and deployment tools - -3. **Coding Conventions** - - Naming conventions - - File organization patterns - - Code structure preferences - - Import/export patterns - -4. **Current Implementation Status** - - Completed features - - Work in progress - - Known technical debt - - Planned improvements - -## Context Maintenance Protocol - -### Before Every Coding Session -1. **Review CONTEXT.md and PRD.md** to understand current project state -2. **Scan recent changes** in git history to understand latest patterns -3. **Identify existing patterns** for similar functionality before implementing new features -4. **Ask for clarification** if existing patterns are unclear or conflicting - -### During Development -1. **Reference existing code** when explaining implementation approaches -2. **Maintain consistency** with established patterns and conventions -3. **Update CONTEXT.md** when making architectural decisions -4. **Document deviations** from established patterns with reasoning - -### Context Preservation Strategies -- **Incremental development**: Build on existing patterns rather than creating new ones -- **Pattern consistency**: Use established data structures and function signatures -- **Integration awareness**: Consider how new code affects existing functionality -- **Dependency management**: Understand existing dependencies before adding new ones - -## Context Prompting Best Practices - -### Effective Context Sharing -- Include relevant sections of CONTEXT.md in prompts for complex tasks -- Reference specific existing files when asking for similar functionality -- Provide examples of existing patterns when requesting new implementations -- Share recent git commit messages to understand latest changes - -### Context Window Optimization -- Prioritize most relevant context for current task -- Use @filename references to include specific files -- Break large contexts into focused, task-specific chunks -- Update context references as project evolves - -## Red Flags - Context Loss Indicators -- AI suggests patterns that conflict with existing code -- New implementations ignore established conventions -- Proposed solutions don't integrate with existing architecture -- Code suggestions require significant refactoring of existing functionality - -## Recovery Protocol -When context loss is detected: -1. **Stop development** and review CONTEXT.md -2. **Analyze existing codebase** for established patterns -3. **Update context documentation** with missing information -4. **Restart task** with proper context provided -5. **Test integration** with existing code before proceeding \ No newline at end of file diff --git a/.cursor/rules/create-prd.mdc b/.cursor/rules/create-prd.mdc deleted file mode 100644 index 046dfa6..0000000 --- a/.cursor/rules/create-prd.mdc +++ /dev/null @@ -1,67 +0,0 @@ ---- -description: Creating PRD for a project or specific task/function -globs: -alwaysApply: false ---- ---- -description: Creating PRD for a project or specific task/function -globs: -alwaysApply: false ---- -# Rule: Generating a Product Requirements Document (PRD) - -## Goal - -To guide an AI assistant in creating a detailed Product Requirements Document (PRD) in Markdown format, based on an initial user prompt. The PRD should be clear, actionable, and suitable for a junior developer to understand and implement the feature. - -## Process - -1. **Receive Initial Prompt:** The user provides a brief description or request for a new feature or functionality. -2. **Ask Clarifying Questions:** Before writing the PRD, the AI *must* ask clarifying questions to gather sufficient detail. The goal is to understand the "what" and "why" of the feature, not necessarily the "how" (which the developer will figure out). -3. **Generate PRD:** Based on the initial prompt and the user's answers to the clarifying questions, generate a PRD using the structure outlined below. -4. **Save PRD:** Save the generated document as `prd-[feature-name].md` inside the `/tasks` directory. - -## Clarifying Questions (Examples) - -The AI should adapt its questions based on the prompt, but here are some common areas to explore: - -* **Problem/Goal:** "What problem does this feature solve for the user?" or "What is the main goal we want to achieve with this feature?" -* **Target User:** "Who is the primary user of this feature?" -* **Core Functionality:** "Can you describe the key actions a user should be able to perform with this feature?" -* **User Stories:** "Could you provide a few user stories? (e.g., As a [type of user], I want to [perform an action] so that [benefit].)" -* **Acceptance Criteria:** "How will we know when this feature is successfully implemented? What are the key success criteria?" -* **Scope/Boundaries:** "Are there any specific things this feature *should not* do (non-goals)?" -* **Data Requirements:** "What kind of data does this feature need to display or manipulate?" -* **Design/UI:** "Are there any existing design mockups or UI guidelines to follow?" or "Can you describe the desired look and feel?" -* **Edge Cases:** "Are there any potential edge cases or error conditions we should consider?" - -## PRD Structure - -The generated PRD should include the following sections: - -1. **Introduction/Overview:** Briefly describe the feature and the problem it solves. State the goal. -2. **Goals:** List the specific, measurable objectives for this feature. -3. **User Stories:** Detail the user narratives describing feature usage and benefits. -4. **Functional Requirements:** List the specific functionalities the feature must have. Use clear, concise language (e.g., "The system must allow users to upload a profile picture."). Number these requirements. -5. **Non-Goals (Out of Scope):** Clearly state what this feature will *not* include to manage scope. -6. **Design Considerations (Optional):** Link to mockups, describe UI/UX requirements, or mention relevant components/styles if applicable. -7. **Technical Considerations (Optional):** Mention any known technical constraints, dependencies, or suggestions (e.g., "Should integrate with the existing Auth module"). -8. **Success Metrics:** How will the success of this feature be measured? (e.g., "Increase user engagement by 10%", "Reduce support tickets related to X"). -9. **Open Questions:** List any remaining questions or areas needing further clarification. - -## Target Audience - -Assume the primary reader of the PRD is a **junior developer**. Therefore, requirements should be explicit, unambiguous, and avoid jargon where possible. Provide enough detail for them to understand the feature's purpose and core logic. - -## Output - -* **Format:** Markdown (`.md`) -* **Location:** `/tasks/` -* **Filename:** `prd-[feature-name].md` - -## Final instructions - -1. Do NOT start implmenting the PRD -2. Make sure to ask the user clarifying questions - -3. Take the user's answers to the clarifying questions and improve the PRD \ No newline at end of file diff --git a/.cursor/rules/documentation.mdc b/.cursor/rules/documentation.mdc deleted file mode 100644 index 4388350..0000000 --- a/.cursor/rules/documentation.mdc +++ /dev/null @@ -1,244 +0,0 @@ ---- -description: Documentation standards for code, architecture, and development decisions -globs: -alwaysApply: false ---- - -# Rule: Documentation Standards - -## Goal -Maintain comprehensive, up-to-date documentation that supports development, onboarding, and long-term maintenance of the codebase. - -## Documentation Hierarchy - -### 1. Project Level Documentation (in ./docs/) -- **README.md**: Project overview, setup instructions, basic usage -- **CONTEXT.md**: Current project state, architecture decisions, patterns -- **CHANGELOG.md**: Version history and significant changes -- **CONTRIBUTING.md**: Development guidelines and processes -- **API.md**: API endpoints, request/response formats, authentication - -### 2. Module Level Documentation (in ./docs/modules/) -- **[module-name].md**: Purpose, public interfaces, usage examples -- **dependencies.md**: External dependencies and their purposes -- **architecture.md**: Module relationships and data flow - -### 3. Code Level Documentation -- **Docstrings**: Function and class documentation -- **Inline comments**: Complex logic explanations -- **Type hints**: Clear parameter and return types -- **README files**: Directory-specific instructions - -## Documentation Standards - -### Code Documentation -```python -def process_user_data(user_id: str, data: dict) -> UserResult: - """ - Process and validate user data before storage. - - Args: - user_id: Unique identifier for the user - data: Dictionary containing user information to process - - Returns: - UserResult: Processed user data with validation status - - Raises: - ValidationError: When user data fails validation - DatabaseError: When storage operation fails - - Example: - >>> result = process_user_data("123", {"name": "John", "email": "john@example.com"}) - >>> print(result.status) - 'valid' - """ -``` - -### API Documentation Format -```markdown -### POST /api/users - -Create a new user account. - -**Request:** -```json -{ - "name": "string (required)", - "email": "string (required, valid email)", - "age": "number (optional, min: 13)" -} -``` - -**Response (201):** -```json -{ - "id": "uuid", - "name": "string", - "email": "string", - "created_at": "iso_datetime" -} -``` - -**Errors:** -- 400: Invalid input data -- 409: Email already exists -``` - -### Architecture Decision Records (ADRs) -Document significant architecture decisions in `./docs/decisions/`: - -```markdown -# ADR-001: Database Choice - PostgreSQL - -## Status -Accepted - -## Context -We need to choose a database for storing user data and application state. - -## Decision -We will use PostgreSQL as our primary database. - -## Consequences -**Positive:** -- ACID compliance ensures data integrity -- Rich query capabilities with SQL -- Good performance for our expected load - -**Negative:** -- More complex setup than simpler alternatives -- Requires SQL knowledge from team members - -## Alternatives Considered -- MongoDB: Rejected due to consistency requirements -- SQLite: Rejected due to scalability needs -``` - -## Documentation Maintenance - -### When to Update Documentation - -#### Always Update: -- **API changes**: Any modification to public interfaces -- **Architecture changes**: New patterns, data structures, or workflows -- **Configuration changes**: Environment variables, deployment settings -- **Dependencies**: Adding, removing, or upgrading packages -- **Business logic changes**: Core functionality modifications - -#### Update Weekly: -- **CONTEXT.md**: Current development status and priorities -- **Known issues**: Bug reports and workarounds -- **Performance notes**: Bottlenecks and optimization opportunities - -#### Update per Release: -- **CHANGELOG.md**: User-facing changes and improvements -- **Version documentation**: Breaking changes and migration guides -- **Examples and tutorials**: Keep sample code current - -### Documentation Quality Checklist - -#### Completeness -- [ ] Purpose and scope clearly explained -- [ ] All public interfaces documented -- [ ] Examples provided for complex usage -- [ ] Error conditions and handling described -- [ ] Dependencies and requirements listed - -#### Accuracy -- [ ] Code examples are tested and working -- [ ] Links point to correct locations -- [ ] Version numbers are current -- [ ] Screenshots reflect current UI - -#### Clarity -- [ ] Written for the intended audience -- [ ] Technical jargon is explained -- [ ] Step-by-step instructions are clear -- [ ] Visual aids used where helpful - -## Documentation Automation - -### Auto-Generated Documentation -- **API docs**: Generate from code annotations -- **Type documentation**: Extract from type hints -- **Module dependencies**: Auto-update from imports -- **Test coverage**: Include coverage reports - -### Documentation Testing -```python -# Test that code examples in documentation work -def test_documentation_examples(): - """Verify code examples in docs actually work.""" - # Test examples from README.md - # Test API examples from docs/API.md - # Test configuration examples -``` - -## Documentation Templates - -### New Module Documentation Template -```markdown -# Module: [Name] - -## Purpose -Brief description of what this module does and why it exists. - -## Public Interface -### Functions -- `function_name(params)`: Description and example - -### Classes -- `ClassName`: Purpose and basic usage - -## Usage Examples -```python -# Basic usage example -``` - -## Dependencies -- Internal: List of internal modules this depends on -- External: List of external packages required - -## Testing -How to run tests for this module. - -## Known Issues -Current limitations or bugs. -``` - -### API Endpoint Template -```markdown -### [METHOD] /api/endpoint - -Brief description of what this endpoint does. - -**Authentication:** Required/Optional -**Rate Limiting:** X requests per minute - -**Request:** -- Headers required -- Body schema -- Query parameters - -**Response:** -- Success response format -- Error response format -- Status codes - -**Example:** -Working request/response example -``` - -## Review and Maintenance Process - -### Documentation Review -- Include documentation updates in code reviews -- Verify examples still work with code changes -- Check for broken links and outdated information -- Ensure consistency with current implementation - -### Regular Audits -- Monthly review of documentation accuracy -- Quarterly assessment of documentation completeness -- Annual review of documentation structure and organization \ No newline at end of file diff --git a/.cursor/rules/enhanced-task-list.mdc b/.cursor/rules/enhanced-task-list.mdc deleted file mode 100644 index b2272e8..0000000 --- a/.cursor/rules/enhanced-task-list.mdc +++ /dev/null @@ -1,207 +0,0 @@ ---- -description: Enhanced task list management with quality gates and iterative workflow integration -globs: -alwaysApply: false ---- - -# Rule: Enhanced Task List Management - -## Goal -Manage task lists with integrated quality gates and iterative workflow to prevent context loss and ensure sustainable development. - -## Task Implementation Protocol - -### Pre-Implementation Check -Before starting any sub-task: -- [ ] **Context Review**: Have you reviewed CONTEXT.md and relevant documentation? -- [ ] **Pattern Identification**: Do you understand existing patterns to follow? -- [ ] **Integration Planning**: Do you know how this will integrate with existing code? -- [ ] **Size Validation**: Is this task small enough (≤50 lines, ≤250 lines per file)? - -### Implementation Process -1. **One sub-task at a time**: Do **NOT** start the next sub‑task until you ask the user for permission and they say "yes" or "y" -2. **Step-by-step execution**: - - Plan the approach in bullet points - - Wait for approval - - Implement the specific sub-task - - Test the implementation - - Update documentation if needed -3. **Quality validation**: Run through the code review checklist before marking complete - -### Completion Protocol -When you finish a **sub‑task**: -1. **Immediate marking**: Change `[ ]` to `[x]` -2. **Quality check**: Verify the implementation meets quality standards -3. **Integration test**: Ensure new code works with existing functionality -4. **Documentation update**: Update relevant files if needed -5. **Parent task check**: If **all** subtasks underneath a parent task are now `[x]`, also mark the **parent task** as completed -6. **Stop and wait**: Get user approval before proceeding to next sub-task - -## Enhanced Task List Structure - -### Task File Header -```markdown -# Task List: [Feature Name] - -**Source PRD**: `prd-[feature-name].md` -**Status**: In Progress / Complete / Blocked -**Context Last Updated**: [Date] -**Architecture Review**: Required / Complete / N/A - -## Quick Links -- [Context Documentation](./CONTEXT.md) -- [Architecture Guidelines](./docs/architecture.md) -- [Related Files](#relevant-files) -``` - -### Task Format with Quality Gates -```markdown -- [ ] 1.0 Parent Task Title - - **Quality Gate**: Architecture review required - - **Dependencies**: List any dependencies - - [ ] 1.1 [Sub-task description 1.1] - - **Size estimate**: [Small/Medium/Large] - - **Pattern reference**: [Reference to existing pattern] - - **Test requirements**: [Unit/Integration/Both] - - [ ] 1.2 [Sub-task description 1.2] - - **Integration points**: [List affected components] - - **Risk level**: [Low/Medium/High] -``` - -## Relevant Files Management - -### Enhanced File Tracking -```markdown -## Relevant Files - -### Implementation Files -- `path/to/file1.ts` - Brief description of purpose and role - - **Status**: Created / Modified / Needs Review - - **Last Modified**: [Date] - - **Review Status**: Pending / Approved / Needs Changes - -### Test Files -- `path/to/file1.test.ts` - Unit tests for file1.ts - - **Coverage**: [Percentage or status] - - **Last Run**: [Date and result] - -### Documentation Files -- `docs/module-name.md` - Module documentation - - **Status**: Up to date / Needs update / Missing - - **Last Updated**: [Date] - -### Configuration Files -- `config/setting.json` - Configuration changes - - **Environment**: [Dev/Staging/Prod affected] - - **Backup**: [Location of backup] -``` - -## Task List Maintenance - -### During Development -1. **Regular updates**: Update task status after each significant change -2. **File tracking**: Add new files as they are created or modified -3. **Dependency tracking**: Note when new dependencies between tasks emerge -4. **Risk assessment**: Flag tasks that become more complex than anticipated - -### Quality Checkpoints -At 25%, 50%, 75%, and 100% completion: -- [ ] **Architecture alignment**: Code follows established patterns -- [ ] **Performance impact**: No significant performance degradation -- [ ] **Security review**: No security vulnerabilities introduced -- [ ] **Documentation current**: All changes are documented - -### Weekly Review Process -1. **Completion assessment**: What percentage of tasks are actually complete? -2. **Quality assessment**: Are completed tasks meeting quality standards? -3. **Process assessment**: Is the iterative workflow being followed? -4. **Risk assessment**: Are there emerging risks or blockers? - -## Task Status Indicators - -### Status Levels -- `[ ]` **Not Started**: Task not yet begun -- `[~]` **In Progress**: Currently being worked on -- `[?]` **Blocked**: Waiting for dependencies or decisions -- `[!]` **Needs Review**: Implementation complete but needs quality review -- `[x]` **Complete**: Finished and quality approved - -### Quality Indicators -- ✅ **Quality Approved**: Passed all quality gates -- ⚠️ **Quality Concerns**: Has issues but functional -- ❌ **Quality Failed**: Needs rework before approval -- 🔄 **Under Review**: Currently being reviewed - -### Integration Status -- 🔗 **Integrated**: Successfully integrated with existing code -- 🔧 **Integration Issues**: Problems with existing code integration -- ⏳ **Integration Pending**: Ready for integration testing - -## Emergency Procedures - -### When Tasks Become Too Complex -If a sub-task grows beyond expected scope: -1. **Stop implementation** immediately -2. **Document current state** and what was discovered -3. **Break down** the task into smaller pieces -4. **Update task list** with new sub-tasks -5. **Get approval** for the new breakdown before proceeding - -### When Context is Lost -If AI seems to lose track of project patterns: -1. **Pause development** -2. **Review CONTEXT.md** and recent changes -3. **Update context documentation** with current state -4. **Restart** with explicit pattern references -5. **Reduce task size** until context is re-established - -### When Quality Gates Fail -If implementation doesn't meet quality standards: -1. **Mark task** with `[!]` status -2. **Document specific issues** found -3. **Create remediation tasks** if needed -4. **Don't proceed** until quality issues are resolved - -## AI Instructions Integration - -### Context Awareness Commands -```markdown -**Before starting any task, run these checks:** -1. @CONTEXT.md - Review current project state -2. @architecture.md - Understand design principles -3. @code-review.md - Know quality standards -4. Look at existing similar code for patterns -``` - -### Quality Validation Commands -```markdown -**After completing any sub-task:** -1. Run code review checklist -2. Test integration with existing code -3. Update documentation if needed -4. Mark task complete only after quality approval -``` - -### Workflow Commands -```markdown -**For each development session:** -1. Review incomplete tasks and their status -2. Identify next logical sub-task to work on -3. Check dependencies and blockers -4. Follow iterative workflow process -5. Update task list with progress and findings -``` - -## Success Metrics - -### Daily Success Indicators -- Tasks are completed according to quality standards -- No sub-tasks are started without completing previous ones -- File tracking remains accurate and current -- Integration issues are caught early - -### Weekly Success Indicators -- Overall task completion rate is sustainable -- Quality issues are decreasing over time -- Context loss incidents are rare -- Team confidence in codebase remains high \ No newline at end of file diff --git a/.cursor/rules/generate-tasks.mdc b/.cursor/rules/generate-tasks.mdc deleted file mode 100644 index ef2f83b..0000000 --- a/.cursor/rules/generate-tasks.mdc +++ /dev/null @@ -1,70 +0,0 @@ ---- -description: Generate a task list or TODO for a user requirement or implementation. -globs: -alwaysApply: false ---- ---- -description: -globs: -alwaysApply: false ---- -# Rule: Generating a Task List from a PRD - -## Goal - -To guide an AI assistant in creating a detailed, step-by-step task list in Markdown format based on an existing Product Requirements Document (PRD). The task list should guide a developer through implementation. - -## Output - -- **Format:** Markdown (`.md`) -- **Location:** `/tasks/` -- **Filename:** `tasks-[prd-file-name].md` (e.g., `tasks-prd-user-profile-editing.md`) - -## Process - -1. **Receive PRD Reference:** The user points the AI to a specific PRD file -2. **Analyze PRD:** The AI reads and analyzes the functional requirements, user stories, and other sections of the specified PRD. -3. **Phase 1: Generate Parent Tasks:** Based on the PRD analysis, create the file and generate the main, high-level tasks required to implement the feature. Use your judgement on how many high-level tasks to use. It's likely to be about 5. Present these tasks to the user in the specified format (without sub-tasks yet). Inform the user: "I have generated the high-level tasks based on the PRD. Ready to generate the sub-tasks? Respond with 'Go' to proceed." -4. **Wait for Confirmation:** Pause and wait for the user to respond with "Go". -5. **Phase 2: Generate Sub-Tasks:** Once the user confirms, break down each parent task into smaller, actionable sub-tasks necessary to complete the parent task. Ensure sub-tasks logically follow from the parent task and cover the implementation details implied by the PRD. -6. **Identify Relevant Files:** Based on the tasks and PRD, identify potential files that will need to be created or modified. List these under the `Relevant Files` section, including corresponding test files if applicable. -7. **Generate Final Output:** Combine the parent tasks, sub-tasks, relevant files, and notes into the final Markdown structure. -8. **Save Task List:** Save the generated document in the `/tasks/` directory with the filename `tasks-[prd-file-name].md`, where `[prd-file-name]` matches the base name of the input PRD file (e.g., if the input was `prd-user-profile-editing.md`, the output is `tasks-prd-user-profile-editing.md`). - -## Output Format - -The generated task list _must_ follow this structure: - -```markdown -## Relevant Files - -- `path/to/potential/file1.ts` - Brief description of why this file is relevant (e.g., Contains the main component for this feature). -- `path/to/file1.test.ts` - Unit tests for `file1.ts`. -- `path/to/another/file.tsx` - Brief description (e.g., API route handler for data submission). -- `path/to/another/file.test.tsx` - Unit tests for `another/file.tsx`. -- `lib/utils/helpers.ts` - Brief description (e.g., Utility functions needed for calculations). -- `lib/utils/helpers.test.ts` - Unit tests for `helpers.ts`. - -### Notes - -- Unit tests should typically be placed alongside the code files they are testing (e.g., `MyComponent.tsx` and `MyComponent.test.tsx` in the same directory). -- Use `npx jest [optional/path/to/test/file]` to run tests. Running without a path executes all tests found by the Jest configuration. - -## Tasks - -- [ ] 1.0 Parent Task Title - - [ ] 1.1 [Sub-task description 1.1] - - [ ] 1.2 [Sub-task description 1.2] -- [ ] 2.0 Parent Task Title - - [ ] 2.1 [Sub-task description 2.1] -- [ ] 3.0 Parent Task Title (may not require sub-tasks if purely structural or configuration) -``` - -## Interaction Model - -The process explicitly requires a pause after generating parent tasks to get user confirmation ("Go") before proceeding to generate the detailed sub-tasks. This ensures the high-level plan aligns with user expectations before diving into details. - -## Target Audience - - -Assume the primary reader of the task list is a **junior developer** who will implement the feature. \ No newline at end of file diff --git a/.cursor/rules/iterative-workflow.mdc b/.cursor/rules/iterative-workflow.mdc deleted file mode 100644 index 65681ca..0000000 --- a/.cursor/rules/iterative-workflow.mdc +++ /dev/null @@ -1,236 +0,0 @@ ---- -description: Iterative development workflow for AI-assisted coding -globs: -alwaysApply: false ---- - -# Rule: Iterative Development Workflow - -## Goal -Establish a structured, iterative development process that prevents the chaos and complexity that can arise from uncontrolled AI-assisted development. - -## Development Phases - -### Phase 1: Planning and Design -**Before writing any code:** - -1. **Understand the Requirement** - - Break down the task into specific, measurable objectives - - Identify existing code patterns that should be followed - - List dependencies and integration points - - Define acceptance criteria - -2. **Design Review** - - Propose approach in bullet points - - Wait for explicit approval before proceeding - - Consider how the solution fits existing architecture - - Identify potential risks and mitigation strategies - -### Phase 2: Incremental Implementation -**One small piece at a time:** - -1. **Micro-Tasks** (≤ 50 lines each) - - Implement one function or small class at a time - - Test immediately after implementation - - Ensure integration with existing code - - Document decisions and patterns used - -2. **Validation Checkpoints** - - After each micro-task, verify it works correctly - - Check that it follows established patterns - - Confirm it integrates cleanly with existing code - - Get approval before moving to next micro-task - -### Phase 3: Integration and Testing -**Ensuring system coherence:** - -1. **Integration Testing** - - Test new code with existing functionality - - Verify no regressions in existing features - - Check performance impact - - Validate error handling - -2. **Documentation Update** - - Update relevant documentation - - Record any new patterns or decisions - - Update context files if architecture changed - -## Iterative Prompting Strategy - -### Step 1: Context Setting -``` -Before implementing [feature], help me understand: -1. What existing patterns should I follow? -2. What existing functions/classes are relevant? -3. How should this integrate with [specific existing component]? -4. What are the potential architectural impacts? -``` - -### Step 2: Plan Creation -``` -Based on the context, create a detailed plan for implementing [feature]: -1. Break it into micro-tasks (≤50 lines each) -2. Identify dependencies and order of implementation -3. Specify integration points with existing code -4. List potential risks and mitigation strategies - -Wait for my approval before implementing. -``` - -### Step 3: Incremental Implementation -``` -Implement only the first micro-task: [specific task] -- Use existing patterns from [reference file/function] -- Keep it under 50 lines -- Include error handling -- Add appropriate tests -- Explain your implementation choices - -Stop after this task and wait for approval. -``` - -## Quality Gates - -### Before Each Implementation -- [ ] **Purpose is clear**: Can explain what this piece does and why -- [ ] **Pattern is established**: Following existing code patterns -- [ ] **Size is manageable**: Implementation is small enough to understand completely -- [ ] **Integration is planned**: Know how it connects to existing code - -### After Each Implementation -- [ ] **Code is understood**: Can explain every line of implemented code -- [ ] **Tests pass**: All existing and new tests are passing -- [ ] **Integration works**: New code works with existing functionality -- [ ] **Documentation updated**: Changes are reflected in relevant documentation - -### Before Moving to Next Task -- [ ] **Current task complete**: All acceptance criteria met -- [ ] **No regressions**: Existing functionality still works -- [ ] **Clean state**: No temporary code or debugging artifacts -- [ ] **Approval received**: Explicit go-ahead for next task -- [ ] **Documentaion updated**: If relevant changes to module was made. - -## Anti-Patterns to Avoid - -### Large Block Implementation -**Don't:** -``` -Implement the entire user management system with authentication, -CRUD operations, and email notifications. -``` - -**Do:** -``` -First, implement just the User model with basic fields. -Stop there and let me review before continuing. -``` - -### Context Loss -**Don't:** -``` -Create a new authentication system. -``` - -**Do:** -``` -Looking at the existing auth patterns in auth.py, implement -password validation following the same structure as the -existing email validation function. -``` - -### Over-Engineering -**Don't:** -``` -Build a flexible, extensible user management framework that -can handle any future requirements. -``` - -**Do:** -``` -Implement user creation functionality that matches the existing -pattern in customer.py, focusing only on the current requirements. -``` - -## Progress Tracking - -### Task Status Indicators -- 🔄 **In Planning**: Requirements gathering and design -- ⏳ **In Progress**: Currently implementing -- ✅ **Complete**: Implemented, tested, and integrated -- 🚫 **Blocked**: Waiting for decisions or dependencies -- 🔧 **Needs Refactor**: Working but needs improvement - -### Weekly Review Process -1. **Progress Assessment** - - What was completed this week? - - What challenges were encountered? - - How well did the iterative process work? - -2. **Process Adjustment** - - Were task sizes appropriate? - - Did context management work effectively? - - What improvements can be made? - -3. **Architecture Review** - - Is the code remaining maintainable? - - Are patterns staying consistent? - - Is technical debt accumulating? - -## Emergency Procedures - -### When Things Go Wrong -If development becomes chaotic or problematic: - -1. **Stop Development** - - Don't continue adding to the problem - - Take time to assess the situation - - Don't rush to "fix" with more AI-generated code - -2. **Assess the Situation** - - What specific problems exist? - - How far has the code diverged from established patterns? - - What parts are still working correctly? - -3. **Recovery Process** - - Roll back to last known good state - - Update context documentation with lessons learned - - Restart with smaller, more focused tasks - - Get explicit approval for each step of recovery - -### Context Recovery -When AI seems to lose track of project patterns: - -1. **Context Refresh** - - Review and update CONTEXT.md - - Include examples of current code patterns - - Clarify architectural decisions - -2. **Pattern Re-establishment** - - Show AI examples of existing, working code - - Explicitly state patterns to follow - - Start with very small, pattern-matching tasks - -3. **Gradual Re-engagement** - - Begin with simple, low-risk tasks - - Verify pattern adherence at each step - - Gradually increase task complexity as consistency returns - -## Success Metrics - -### Short-term (Daily) -- Code is understandable and well-integrated -- No major regressions introduced -- Development velocity feels sustainable -- Team confidence in codebase remains high - -### Medium-term (Weekly) -- Technical debt is not accumulating -- New features integrate cleanly -- Development patterns remain consistent -- Documentation stays current - -### Long-term (Monthly) -- Codebase remains maintainable as it grows -- New team members can understand and contribute -- AI assistance enhances rather than hinders development -- Architecture remains clean and purposeful \ No newline at end of file diff --git a/.cursor/rules/project.mdc b/.cursor/rules/project.mdc deleted file mode 100644 index c5f004d..0000000 --- a/.cursor/rules/project.mdc +++ /dev/null @@ -1,24 +0,0 @@ ---- -description: -globs: -alwaysApply: true ---- -# Rule: Project specific rules - -## Goal -Unify the project structure and interraction with tools and console - -### System tools -- **ALWAYS** use UV for package management -- **ALWAYS** use Arch linux compatible command for terminal - -### Coding patterns -- **ALWYAS** check the arguments and methods before use to avoid errors with wrong parameters or names -- If in doubt, check [CONTEXT.md](mdc:CONTEXT.md) file and [architecture.md](mdc:docs/architecture.md) -- **PREFER** ORM pattern for databases with SQLAclhemy. -- **DO NOT USE** emoji in code and comments - -### Testing -- Use UV for test in format *uv run pytest [filename]* - - diff --git a/.cursor/rules/refactoring.mdc b/.cursor/rules/refactoring.mdc deleted file mode 100644 index c141666..0000000 --- a/.cursor/rules/refactoring.mdc +++ /dev/null @@ -1,237 +0,0 @@ ---- -description: Code refactoring and technical debt management for AI-assisted development -globs: -alwaysApply: false ---- - -# Rule: Code Refactoring and Technical Debt Management - -## Goal -Guide AI in systematic code refactoring to improve maintainability, reduce complexity, and prevent technical debt accumulation in AI-assisted development projects. - -## When to Apply This Rule -- Code complexity has increased beyond manageable levels -- Duplicate code patterns are detected -- Performance issues are identified -- New features are difficult to integrate -- Code review reveals maintainability concerns -- Weekly technical debt assessment indicates refactoring needs - -## Pre-Refactoring Assessment - -Before starting any refactoring, the AI MUST: - -1. **Context Analysis:** - - Review existing `CONTEXT.md` for architectural decisions - - Analyze current code patterns and conventions - - Identify all files that will be affected (search the codebase for use) - - Check for existing tests that verify current behavior - -2. **Scope Definition:** - - Clearly define what will and will not be changed - - Identify the specific refactoring pattern to apply - - Estimate the blast radius of changes - - Plan rollback strategy if needed - -3. **Documentation Review:** - - Check `./docs/` for relevant module documentation - - Review any existing architectural diagrams - - Identify dependencies and integration points - - Note any known constraints or limitations - -## Refactoring Process - -### Phase 1: Planning and Safety -1. **Create Refactoring Plan:** - - Document the current state and desired end state - - Break refactoring into small, atomic steps - - Identify tests that must pass throughout the process - - Plan verification steps for each change - -2. **Establish Safety Net:** - - Ensure comprehensive test coverage exists - - If tests are missing, create them BEFORE refactoring - - Document current behavior that must be preserved - - Create backup of current implementation approach - -3. **Get Approval:** - - Present the refactoring plan to the user - - Wait for explicit "Go" or "Proceed" confirmation - - Do NOT start refactoring without approval - -### Phase 2: Incremental Implementation -4. **One Change at a Time:** - - Implement ONE refactoring step per iteration - - Run tests after each step to ensure nothing breaks - - Update documentation if interfaces change - - Mark progress in the refactoring plan - -5. **Verification Protocol:** - - Run all relevant tests after each change - - Verify functionality works as expected - - Check performance hasn't degraded - - Ensure no new linting or type errors - -6. **User Checkpoint:** - - After each significant step, pause for user review - - Present what was changed and current status - - Wait for approval before continuing - - Address any concerns before proceeding - -### Phase 3: Completion and Documentation -7. **Final Verification:** - - Run full test suite to ensure nothing is broken - - Verify all original functionality is preserved - - Check that new code follows project conventions - - Confirm performance is maintained or improved - -8. **Documentation Update:** - - Update `CONTEXT.md` with new patterns/decisions - - Update module documentation in `./docs/` - - Document any new conventions established - - Note lessons learned for future refactoring - -## Common Refactoring Patterns - -### Extract Method/Function -``` -WHEN: Functions/methods exceed 50 lines or have multiple responsibilities -HOW: -1. Identify logical groupings within the function -2. Extract each group into a well-named helper function -3. Ensure each function has a single responsibility -4. Verify tests still pass -``` - -### Extract Module/Class -``` -WHEN: Files exceed 250 lines or handle multiple concerns -HOW: -1. Identify cohesive functionality groups -2. Create new files for each group -3. Move related functions/classes together -4. Update imports and dependencies -5. Verify module boundaries are clean -``` - -### Eliminate Duplication -``` -WHEN: Similar code appears in multiple places -HOW: -1. Identify the common pattern or functionality -2. Extract to a shared utility function or module -3. Update all usage sites to use the shared code -4. Ensure the abstraction is not over-engineered -``` - -### Improve Data Structures -``` -WHEN: Complex nested objects or unclear data flow -HOW: -1. Define clear interfaces/types for data structures -2. Create transformation functions between different representations -3. Ensure data flow is unidirectional where possible -4. Add validation at boundaries -``` - -### Reduce Coupling -``` -WHEN: Modules are tightly interconnected -HOW: -1. Identify dependencies between modules -2. Extract interfaces for external dependencies -3. Use dependency injection where appropriate -4. Ensure modules can be tested in isolation -``` - -## Quality Gates - -Every refactoring must pass these gates: - -### Technical Quality -- [ ] All existing tests pass -- [ ] No new linting errors introduced -- [ ] Code follows established project conventions -- [ ] No performance regression detected -- [ ] File sizes remain under 250 lines -- [ ] Function sizes remain under 50 lines - -### Maintainability -- [ ] Code is more readable than before -- [ ] Duplicated code has been reduced -- [ ] Module responsibilities are clearer -- [ ] Dependencies are explicit and minimal -- [ ] Error handling is consistent - -### Documentation -- [ ] Public interfaces are documented -- [ ] Complex logic has explanatory comments -- [ ] Architectural decisions are recorded -- [ ] Examples are provided where helpful - -## AI Instructions for Refactoring - -1. **Always ask for permission** before starting any refactoring work -2. **Start with tests** - ensure comprehensive coverage before changing code -3. **Work incrementally** - make small changes and verify each step -4. **Preserve behavior** - functionality must remain exactly the same -5. **Update documentation** - keep all docs current with changes -6. **Follow conventions** - maintain consistency with existing codebase -7. **Stop and ask** if any step fails or produces unexpected results -8. **Explain changes** - clearly communicate what was changed and why - -## Anti-Patterns to Avoid - -### Over-Engineering -- Don't create abstractions for code that isn't duplicated -- Avoid complex inheritance hierarchies -- Don't optimize prematurely - -### Breaking Changes -- Never change public APIs without explicit approval -- Don't remove functionality, even if it seems unused -- Avoid changing behavior "while we're here" - -### Scope Creep -- Stick to the defined refactoring scope -- Don't add new features during refactoring -- Resist the urge to "improve" unrelated code - -## Success Metrics - -Track these metrics to ensure refactoring effectiveness: - -### Code Quality -- Reduced cyclomatic complexity -- Lower code duplication percentage -- Improved test coverage -- Fewer linting violations - -### Developer Experience -- Faster time to understand code -- Easier integration of new features -- Reduced bug introduction rate -- Higher developer confidence in changes - -### Maintainability -- Clearer module boundaries -- More predictable behavior -- Easier debugging and troubleshooting -- Better performance characteristics - -## Output Files - -When refactoring is complete, update: -- `refactoring-log-[date].md` - Document what was changed and why -- `CONTEXT.md` - Update with new patterns and decisions -- `./docs/` - Update relevant module documentation -- Task lists - Mark refactoring tasks as complete - -## Final Verification - -Before marking refactoring complete: -1. Run full test suite and verify all tests pass -2. Check that code follows all project conventions -3. Verify documentation is up to date -4. Confirm user is satisfied with the results -5. Record lessons learned for future refactoring efforts diff --git a/.cursor/rules/task-list.mdc b/.cursor/rules/task-list.mdc deleted file mode 100644 index 939a9f1..0000000 --- a/.cursor/rules/task-list.mdc +++ /dev/null @@ -1,44 +0,0 @@ ---- -description: TODO list task implementation -globs: -alwaysApply: false ---- ---- -description: -globs: -alwaysApply: false ---- -# Task List Management - -Guidelines for managing task lists in markdown files to track progress on completing a PRD - -## Task Implementation -- **One sub-task at a time:** Do **NOT** start the next sub‑task until you ask the user for permission and they say “yes” or "y" -- **Completion protocol:** - 1. When you finish a **sub‑task**, immediately mark it as completed by changing `[ ]` to `[x]`. - 2. If **all** subtasks underneath a parent task are now `[x]`, also mark the **parent task** as completed. -- Stop after each sub‑task and wait for the user’s go‑ahead. - -## Task List Maintenance - -1. **Update the task list as you work:** - - Mark tasks and subtasks as completed (`[x]`) per the protocol above. - - Add new tasks as they emerge. - -2. **Maintain the “Relevant Files” section:** - - List every file created or modified. - - Give each file a one‑line description of its purpose. - -## AI Instructions - -When working with task lists, the AI must: - -1. Regularly update the task list file after finishing any significant work. -2. Follow the completion protocol: - - Mark each finished **sub‑task** `[x]`. - - Mark the **parent task** `[x]` once **all** its subtasks are `[x]`. -3. Add newly discovered tasks. -4. Keep “Relevant Files” accurate and up to date. -5. Before starting work, check which sub‑task is next. - -6. After implementing a sub‑task, update the file and then pause for user approval. \ No newline at end of file diff --git a/.vscode/launch.json b/.vscode/launch.json index 10ca33b..30490f0 100644 --- a/.vscode/launch.json +++ b/.vscode/launch.json @@ -16,7 +16,10 @@ "args": [ "BTC-USDT", "2025-08-26", - "2025-08-30" + // "2025-08-30" + "2025-09-22", + "--timeframe-minutes", "15", + "--no-ui" ] } ] diff --git a/.vscode/settings.json b/.vscode/settings.json index 3e99ede..aa48d2a 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -3,5 +3,6 @@ "." ], "python.testing.unittestEnabled": false, - "python.testing.pytestEnabled": true + "python.testing.pytestEnabled": true, + "python.languageServer": "None" } \ No newline at end of file diff --git a/charts/matplotlib_viz_figure_1.png b/charts/matplotlib_viz_figure_1.png deleted file mode 100644 index 5ed3a68..0000000 Binary files a/charts/matplotlib_viz_figure_1.png and /dev/null differ diff --git a/desktop_app.py b/desktop_app.py index 3a2977c..cca81f3 100644 --- a/desktop_app.py +++ b/desktop_app.py @@ -116,7 +116,7 @@ class MainWindow(QMainWindow): super().__init__() self.ohlc_data: List[Dict[str, Any]] = [] self.metrics_data: List[Any] = [] # adapt shape to your MetricsCalculator - self.depth_data: Dict[str, List[List[float]]] = {"bids": [], "asks": []} + # self.depth_data: Dict[str, List[List[float]]] = {"bids": [], "asks": []} self._init_ui() @@ -132,8 +132,8 @@ class MainWindow(QMainWindow): charts_widget = QWidget() charts_layout = QVBoxLayout(charts_widget) - depth_widget = QWidget() - depth_layout = QVBoxLayout(depth_widget) + # depth_widget = QWidget() + # depth_layout = QVBoxLayout(depth_widget) # PG appearance pg.setConfigOptions(antialias=True, background="k", foreground="w") @@ -158,10 +158,10 @@ class MainWindow(QMainWindow): self.cvd_plot.showGrid(x=True, y=True, alpha=0.3) # Depth (not time x-axis) - self.depth_plot = pg.PlotWidget(title="Order Book Depth") - self.depth_plot.setLabel("left", "Price", units="USD") - self.depth_plot.setLabel("bottom", "Cumulative Volume") - self.depth_plot.showGrid(x=True, y=True, alpha=0.3) + # self.depth_plot = pg.PlotWidget(title="Order Book Depth") + # self.depth_plot.setLabel("left", "Price", units="USD") + # self.depth_plot.setLabel("bottom", "Cumulative Volume") + # self.depth_plot.showGrid(x=True, y=True, alpha=0.3) # Link x-axes self.volume_plot.setXLink(self.ohlc_plot) @@ -173,15 +173,15 @@ class MainWindow(QMainWindow): self._setup_double_click_autorange() # Layout weights - charts_layout.addWidget(self.ohlc_plot, 3) + charts_layout.addWidget(self.ohlc_plot, 1) charts_layout.addWidget(self.volume_plot, 1) charts_layout.addWidget(self.obi_plot, 1) charts_layout.addWidget(self.cvd_plot, 1) - depth_layout.addWidget(self.depth_plot) + # depth_layout.addWidget(self.depth_plot) main_layout.addWidget(charts_widget, 3) - main_layout.addWidget(depth_widget, 1) + # main_layout.addWidget(depth_widget, 1) logging.info("UI setup completed") @@ -232,17 +232,19 @@ class MainWindow(QMainWindow): self.ohlc_plot.addItem(self.data_label) # ----- Data ingestion from processor ----- - def update_data(self, data_processor: OHLCProcessor): + def update_data(self, data_processor: OHLCProcessor, nb_bars: int): """ Pull latest bars/metrics from the processor and refresh plots. Call this whenever you've added trade/book data + processor.flush(). """ - self.ohlc_data = data_processor.bars or [] - # Optional: if your MetricsCalculator exposes a series - try: - self.metrics_data = data_processor.get_metrics_series() or [] - except Exception: - self.metrics_data = [] + self.ohlc_data = data_processor.bars[-nb_bars:] or [] + self.metrics_data = data_processor.get_metrics_series() + + self.metrics_data = { + "cvd": self.metrics_data["cvd"][-nb_bars:], + "obi": self.metrics_data["obi"][-nb_bars:] + } + self._update_all_plots() # ----- Plot updates ----- @@ -251,7 +253,7 @@ class MainWindow(QMainWindow): self._update_volume_plot() self._update_obi_plot() self._update_cvd_plot() - self._update_depth_plot() + # self._update_depth_plot() def _clear_plot_items_but_crosshair(self, plot: pg.PlotWidget): protected = (pg.InfiniteLine, pg.LabelItem) @@ -295,13 +297,6 @@ class MainWindow(QMainWindow): vol = float(bar["volume"]) self.volume_plot.addItem(pg.BarGraphItem(x=[x_center], height=[vol], width=bar_w, brush=color)) - def _update_obi_plot(self): - """ - Expecting metrics_data entries shaped like: - [ts_start_ms, ts_end_ms, obi_o, obi_h, obi_l, obi_c, ...] - Adapt if your MetricsCalculator differs. - """ - def _update_obi_plot(self): """ Update OBI panel with candlesticks from metrics_data. @@ -313,7 +308,7 @@ class MainWindow(QMainWindow): # Convert metrics_data rows to candle dicts candlesticks = [] - for row in self.metrics_data: + for row in self.metrics_data['obi']: if len(row) >= 6: ts0, ts1, o, h, l, c = row[:6] candlesticks.append({ @@ -338,20 +333,48 @@ class MainWindow(QMainWindow): """ Plot CVD as a line if present at index 6, else skip. """ + # if not self.metrics_data: + # return + # self._clear_plot_items_but_crosshair(self.cvd_plot) + + # xs = [] + # ys = [] + # for row in self.metrics_data['cvd']: + # if len(row) >= 7: + # ts0 = row[0] / 1000.0 + # cvd_val = float(row[6]) + # xs.append(ts0) + # ys.append(cvd_val) + # if xs: + # self.cvd_plot.plot(xs, ys, pen=pg.mkPen(color="#ffff00", width=2), name="CVD") + + if not self.metrics_data: return self._clear_plot_items_but_crosshair(self.cvd_plot) - xs = [] - ys = [] - for row in self.metrics_data: - if len(row) >= 7: - ts0 = row[0] / 1000.0 - cvd_val = float(row[6]) - xs.append(ts0) - ys.append(cvd_val) - if xs: - self.cvd_plot.plot(xs, ys, pen=pg.mkPen(color="#ffff00", width=2), name="CVD") + # Convert metrics_data rows to candle dicts + candlesticks = [] + for row in self.metrics_data['cvd']: + if len(row) >= 6: + ts0, ts1, o, h, l, c = row[:6] + candlesticks.append({ + "timestamp_start": int(ts0), + "timestamp_end": int(ts1), + "open": float(o), + "high": float(h), + "low": float(l), + "close": float(c), + "volume": 0.0, + }) + + if candlesticks: + self.cvd_plot.addItem(OBIItem(candlesticks, body_ratio=0.8)) + + # Also set Y range explicitly + lows = [c["low"] for c in candlesticks] + highs = [c["high"] for c in candlesticks] + self.cvd_plot.setYRange(min(lows), max(highs)) # ----- Depth chart ----- @staticmethod @@ -456,6 +479,7 @@ class MainWindow(QMainWindow): if self.metrics_data: # Optional: display OBI/CVD values if present + return #fix access to dict row = self._closest_metrics_row(self.metrics_data, x_seconds) if row and len(row) >= 6: _, _, o, h, l, c = row[:6] diff --git a/docs/API.md b/docs/API.md deleted file mode 100644 index 344e592..0000000 --- a/docs/API.md +++ /dev/null @@ -1,151 +0,0 @@ -# API Documentation (Current Implementation) - -## Overview - -This document describes the public interfaces of the current system: SQLite streaming, OHLC/depth aggregation, JSON-based IPC, and the Dash visualizer. Metrics (OBI/CVD), repository/storage layers, and strategy APIs are not part of the current implementation. - -## Input Database Schema (Required) - -### book table -```sql -CREATE TABLE book ( - id INTEGER PRIMARY KEY, - instrument TEXT, - bids TEXT NOT NULL, -- Python-literal: [[price, size, ...], ...] - asks TEXT NOT NULL, -- Python-literal: [[price, size, ...], ...] - timestamp TEXT NOT NULL -); -``` - -### trades table -```sql -CREATE TABLE trades ( - id INTEGER PRIMARY KEY, - instrument TEXT, - trade_id TEXT, - price REAL NOT NULL, - size REAL NOT NULL, - side TEXT NOT NULL, -- "buy" or "sell" - timestamp TEXT NOT NULL -); -``` - -## Data Access: db_interpreter.py - -### Classes -- `OrderbookLevel` (dataclass): represents a price level. -- `OrderbookUpdate`: windowed book update with `bids`, `asks`, `timestamp`, `end_timestamp`. - -### DBInterpreter -```python -class DBInterpreter: - def __init__(self, db_path: Path): ... - - def stream(self) -> Iterator[tuple[OrderbookUpdate, list[tuple]]]: - """ - Stream orderbook rows with one-row lookahead and trades in timestamp order. - Yields pairs of (OrderbookUpdate, trades_in_window), where each trade tuple is: - (id, trade_id, price, size, side, timestamp_ms) and timestamp_ms ∈ [timestamp, end_timestamp). - """ -``` - -- Read-only SQLite connection with PRAGMA tuning (immutable, query_only, mmap, cache). -- Batch sizes: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`. - -## Processing: ohlc_processor.py - -### OHLCProcessor -```python -class OHLCProcessor: - def __init__(self, window_seconds: int = 60, depth_levels_per_side: int = 50): ... - - def process_trades(self, trades: list[tuple]) -> None: - """Aggregate trades into OHLC bars per window; throttled upserts for UI responsiveness.""" - - def update_orderbook(self, ob_update: OrderbookUpdate) -> None: - """Maintain in-memory price→size maps, apply partial updates, and emit top-N depth snapshots periodically.""" - - def finalize(self) -> None: - """Emit the last OHLC bar if present.""" -``` - -- Internal helpers for parsing levels from JSON or Python-literal strings and for applying deletions (size==0). - -## Inter-Process Communication: viz_io.py - -### Files -- `ohlc_data.json`: rolling array of OHLC bars (max 1000). -- `depth_data.json`: latest depth snapshot (bids/asks), top-N per side. -- `metrics_data.json`: rolling array of OBI OHLC bars (max 1000). - -### Functions -```python -def add_ohlc_bar(timestamp: int, open_price: float, high_price: float, low_price: float, close_price: float, volume: float = 0.0) -> None: ... - -def upsert_ohlc_bar(timestamp: int, open_price: float, high_price: float, low_price: float, close_price: float, volume: float = 0.0) -> None: ... - -def clear_data() -> None: ... - -def add_metric_bar(timestamp: int, obi_open: float, obi_high: float, obi_low: float, obi_close: float) -> None: ... - -def upsert_metric_bar(timestamp: int, obi_open: float, obi_high: float, obi_low: float, obi_close: float) -> None: ... - -def clear_metrics() -> None: ... -``` - -- Atomic writes via temp file replace to prevent partial reads. - -## Visualization: app.py (Dash) - -- Three visuals: OHLC+Volume and Depth (cumulative) with Plotly dark theme, plus an OBI candlestick subplot beneath Volume. -- Polling interval: 500 ms. Tolerates JSON decode races using cached last values. - -### Callback Contract -```python -@app.callback( - [Output('ohlc-chart', 'figure'), Output('depth-chart', 'figure')], - [Input('interval-update', 'n_intervals')] -) -``` -- Reads `ohlc_data.json` (list of `[ts, open, high, low, close, volume]`). -- Reads `depth_data.json` (`{"bids": [[price, size], ...], "asks": [[price, size], ...]}`). -- Reads `metrics_data.json` (list of `[ts, obi_o, obi_h, obi_l, obi_c]`). - -## CLI Orchestration: main.py - -### Typer Entry Point -```python -def main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None: - """Stream DBs, process OHLC/depth, and launch Dash visualizer in a separate process.""" -``` - -- Discovers databases under `../data/OKX` matching the instrument and date range. -- Launches UI: `uv run python app.py`. - -## Usage Examples - -### Run processing + UI -```bash -uv run python main.py BTC-USDT 2025-07-01 2025-08-01 --window-seconds 60 -# Open http://localhost:8050 -``` - -### Process trades and update depth in a loop (conceptual) -```python -from db_interpreter import DBInterpreter -from ohlc_processor import OHLCProcessor - -processor = OHLCProcessor(window_seconds=60) -for ob_update, trades in DBInterpreter(db_path).stream(): - processor.process_trades(trades) - processor.update_orderbook(ob_update) -processor.finalize() -``` - -## Error Handling -- Reader/Writer coordination via atomic JSON prevents partial reads. -- Visualizer caches last valid data if JSON decoding fails mid-write; logs warnings. -- Visualizer start failures do not stop processing; logs error and continues. - -## Notes -- Metrics computation includes simplified OBI (Order Book Imbalance) calculated as bid_total - ask_total. Repository/storage layers and strategy APIs are intentionally kept minimal. diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md deleted file mode 100644 index 34995b8..0000000 --- a/docs/CHANGELOG.md +++ /dev/null @@ -1,152 +0,0 @@ -# Changelog - -All notable changes to the Orderflow Backtest System are documented in this file. - -The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), -and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - -## [Unreleased] - -### Added -- Comprehensive documentation structure with module-specific guides -- Architecture Decision Records (ADRs) for major technical decisions -- CONTRIBUTING.md with development guidelines and standards -- Enhanced module documentation in `docs/modules/` directory -- Dependency documentation with security and performance considerations - -### Changed -- Documentation structure reorganized to follow documentation standards -- Improved code documentation requirements with examples -- Enhanced testing guidelines with coverage requirements - -## [2.0.0] - 2024-12-Present - -### Added -- **Simplified Pipeline Architecture**: Streamlined SQLite → OHLC/Depth → JSON → Dash pipeline -- **JSON-based IPC**: Atomic file-based communication between processor and visualizer -- **Real-time Visualization**: Dash web application with 500ms polling updates -- **OHLC Aggregation**: Configurable time window aggregation with throttled updates -- **Orderbook Depth**: Real-time depth snapshots with top-N level management -- **OBI Metrics**: Order Book Imbalance calculation with candlestick visualization -- **Atomic JSON Operations**: Race-condition-free data exchange via temp files -- **CLI Orchestration**: Typer-based command interface with process management -- **Performance Optimizations**: Batch reading with optimized SQLite PRAGMA settings - -### Changed -- **Architecture Simplification**: Removed complex repository/storage layers -- **Data Flow**: Direct streaming from database to visualization via JSON -- **Error Handling**: Graceful degradation with cached data fallbacks -- **Process Management**: Separate visualization process launched automatically -- **Memory Efficiency**: Bounded datasets prevent unlimited memory growth - -### Technical Details -- **Database Access**: Read-only SQLite with immutable mode and mmap optimization -- **Batch Sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal performance -- **JSON Formats**: Standardized schemas for OHLC, depth, and metrics data -- **Chart Architecture**: Multi-subplot layout with shared time axis -- **IPC Files**: `ohlc_data.json`, `depth_data.json`, `metrics_data.json` - -### Removed -- Complex metrics storage and repository patterns -- Strategy framework components -- In-memory snapshot retention -- Multi-database orchestration complexity - -## [1.0.0] - Previous Version - -### Features -- **Orderbook Reconstruction**: Build complete orderbooks from SQLite database files -- **Data Models**: Core structures for `OrderbookLevel`, `Trade`, `BookSnapshot`, `Book` -- **SQLite Repository**: Read-only data access for orderbook and trades data -- **Orderbook Parser**: Text parsing with price caching optimization -- **Storage Orchestration**: High-level facade for book building -- **Basic Visualization**: OHLC candlestick charts with Qt5Agg backend -- **Strategy Framework**: Basic strategy pattern with `DefaultStrategy` -- **CLI Interface**: Command-line application for date range processing -- **Test Suite**: Unit and integration tests - -### Architecture -- **Repository Pattern**: Clean separation of data access logic -- **Dataclass Models**: Lightweight, type-safe data structures -- **Parser Optimization**: Price caching for performance -- **Modular Design**: Clear separation between components - ---- - -## Migration Guide - -### Upgrading from v1.0.0 to v2.0.0 - -#### Code Changes Required - -1. **Strategy Constructor** - ```python - # Before (v1.0.0) - strategy = DefaultStrategy("BTC-USDT", enable_visualization=True) - - # After (v2.0.0) - strategy = DefaultStrategy("BTC-USDT") - visualizer = Visualizer(window_seconds=60, max_bars=500) - ``` - -2. **Main Application Flow** - ```python - # Before (v1.0.0) - strategy = DefaultStrategy(instrument, enable_visualization=True) - storage.build_booktick_from_db(db_path, db_date) - strategy.on_booktick(storage.book) - - # After (v2.0.0) - strategy = DefaultStrategy(instrument) - visualizer = Visualizer(window_seconds=60, max_bars=500) - - strategy.set_db_path(db_path) - visualizer.set_db_path(db_path) - storage.build_booktick_from_db(db_path, db_date) - strategy.on_booktick(storage.book) - visualizer.update_from_book(storage.book) - ``` - -#### Database Migration -- **Automatic**: Metrics table created automatically on first run -- **No Data Loss**: Existing orderbook and trades data unchanged -- **Schema Addition**: New `metrics` table with indexes added to existing databases - -#### Benefits of Upgrading -- **Memory Efficiency**: >70% reduction in memory usage -- **Performance**: Faster processing through persistent metrics storage -- **Enhanced Analysis**: Access to OBI and CVD financial indicators -- **Better Visualization**: Multi-chart display with synchronized time axis -- **Improved Architecture**: Cleaner separation of concerns - -#### Testing Migration -```bash -# Verify upgrade compatibility -uv run pytest tests/test_main_integration.py -v - -# Test new metrics functionality -uv run pytest tests/test_storage_metrics.py -v - -# Validate visualization separation -uv run pytest tests/test_main_visualization.py -v -``` - ---- - -## Development Notes - -### Performance Improvements -- **v2.0.0**: >70% memory reduction, batch processing, persistent storage -- **v1.0.0**: In-memory processing, real-time calculations - -### Architecture Evolution -- **v2.0.0**: Streaming processing with metrics storage, separated visualization -- **v1.0.0**: Full snapshot retention, integrated visualization in strategies - -### Testing Coverage -- **v2.0.0**: 27 tests across 6 files, integration and unit coverage -- **v1.0.0**: Basic unit tests for core components - ---- - -*For detailed technical documentation, see [docs/](../docs/) directory.* diff --git a/docs/CONTEXT.md b/docs/CONTEXT.md deleted file mode 100644 index b1b43b9..0000000 --- a/docs/CONTEXT.md +++ /dev/null @@ -1,53 +0,0 @@ -# Project Context - -## Current State - -The project implements a modular, efficient orderflow processing pipeline: -- Stream orderflow from SQLite (`DBInterpreter.stream`). -- Process trades and orderbook updates through modular `OHLCProcessor` architecture. -- Exchange data with the UI via atomic JSON files (`viz_io`). -- Render OHLC+Volume, Depth, and Metrics charts with a Dash app (`app.py`). - -The system features a clean composition-based architecture with specialized modules for different concerns, providing OBI/CVD metrics alongside OHLC data. - -## Recent Work - -- **Modular Refactoring**: Extracted `ohlc_processor.py` into focused modules: - - `level_parser.py`: Orderbook level parsing utilities (85 lines) - - `orderbook_manager.py`: In-memory orderbook state management (90 lines) - - `metrics_calculator.py`: OBI and CVD metrics calculation (112 lines) -- **Architecture Compliance**: Reduced main processor from 440 to 248 lines (250-line target achieved) -- Maintained full backward compatibility and functionality -- Implemented read-only, batched SQLite streaming with PRAGMA tuning. -- Added robust JSON IPC with atomic writes and tolerant UI reads. -- Built a responsive Dash visualization polling at 500ms. -- Unified CLI using Typer, with UV for process management. - -## Conventions - -- Python 3.12+, UV for dependency and command execution. -- **Modular Architecture**: Composition over inheritance, single-responsibility modules -- **File Size Limits**: ≤250 lines per file, ≤50 lines per function (enforced) -- Type hints throughout; concise, focused functions and classes. -- Error handling with meaningful logs; avoid bare exceptions. -- Prefer explicit JSON structures for IPC; keep payloads small and bounded. - -## Priorities - -- Improve configurability: database path discovery, CLI flags for paths and UI options. -- Add tests for `DBInterpreter.stream` and `OHLCProcessor` (run with `uv run pytest`). -- Performance tuning for large DBs while keeping UI responsive. -- Documentation kept in sync with code; architecture reflects current design. - -## Roadmap (Future Work) - -- Enhance OBI metrics with additional derived calculations (e.g., normalized OBI). -- Optional repository layer abstraction and a storage orchestrator. -- Extend visualization with additional subplots and interactivity. -- Strategy module for analytics and alerting on derived metrics. - -## Tooling - -- Package management and commands: UV (e.g., `uv sync`, `uv run ...`). -- Visualization server: Dash on `http://localhost:8050`. -- Linting/testing: Pytest (e.g., `uv run pytest`). diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index c822028..0000000 --- a/docs/README.md +++ /dev/null @@ -1,26 +0,0 @@ -# Orderflow Backtest System Documentation - -## Overview - -This directory contains documentation for the current Orderflow Backtest System, which streams historical orderflow from SQLite, aggregates OHLC bars, maintains a lightweight depth snapshot, and renders charts via a Dash web application. - -## Documentation Structure - -- `architecture.md`: System architecture, component relationships, and data flow (SQLite → Streaming → OHLC/Depth → JSON → Dash) -- `API.md`: Public interfaces for DB streaming, OHLC/depth processing, JSON IPC, Dash visualization, and CLI -- `CONTEXT.md`: Project state, conventions, and development priorities -- `decisions/`: Architecture decision records - -## Quick Navigation - -| Topic | Documentation | -|-------|---------------| -| Getting Started | See the usage examples in `API.md` | -| System Architecture | `architecture.md` | -| Database Schema | `API.md#input-database-schema-required` | -| Development Setup | Project root `README` and `pyproject.toml` | - -## Notes - -- Metrics (OBI/CVD), repository/storage layers, and strategy components have been removed from the current codebase and are planned as future enhancements. -- Use UV for package management and running commands. Example: `uv run python main.py ...`. diff --git a/docs/architecture.md b/docs/architecture.md deleted file mode 100644 index 11a3a9f..0000000 --- a/docs/architecture.md +++ /dev/null @@ -1,156 +0,0 @@ -# System Architecture - -## Overview - -The current system is a streamlined, high-performance pipeline that streams orderflow from SQLite databases, aggregates trades into OHLC bars, maintains a lightweight depth snapshot, and serves visuals via a Dash web application. Inter-process communication (IPC) between the processor and visualizer uses atomic JSON files for simplicity and robustness. - -## High-Level Architecture - -``` -┌─────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ ┌──────────────────┐ -│ SQLite Files │ → │ DB Interpreter │ → │ OHLC/Depth │ → │ Dash Visualizer │ -│ (book,trades) │ │ (stream rows) │ │ Processor │ │ (app.py) │ -└─────────────────┘ └─────────────────────┘ └─────────┬────────┘ └────────────▲─────┘ - │ │ - │ Atomic JSON (IPC) │ - ▼ │ - ohlc_data.json, depth_data.json │ - metrics_data.json │ - │ - Browser UI -``` - -## Components - -### Data Access (`db_interpreter.py`) - -- `OrderbookLevel`: dataclass representing one price level. -- `OrderbookUpdate`: container for a book row window with `bids`, `asks`, `timestamp`, and `end_timestamp`. -- `DBInterpreter`: - - `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]` streams the book table with lookahead and the trades table in timestamp order. - - Efficient read-only connection with PRAGMA tuning: immutable mode, query_only, temp_store=MEMORY, mmap_size, cache_size. - - Batching constants: `BOOK_BATCH = 2048`, `TRADE_BATCH = 4096`. - - Each yielded `trades` element is a tuple `(id, trade_id, price, size, side, timestamp_ms)` that falls within `[book.timestamp, next_book.timestamp)`. - -### Processing (Modular Architecture) - -#### Main Coordinator (`ohlc_processor.py`) -- `OHLCProcessor(window_seconds=60, depth_levels_per_side=50)`: Orchestrates trade processing using composition - - `process_trades(trades)`: aggregates trades into OHLC bars and delegates CVD updates - - `update_orderbook(ob_update)`: coordinates orderbook updates and OBI metric calculation - - `finalize()`: finalizes both OHLC bars and metrics data - - `cvd_cumulative` (property): provides access to cumulative volume delta - -#### Orderbook Management (`orderbook_manager.py`) -- `OrderbookManager`: Handles in-memory orderbook state with partial updates - - Maintains separate bid/ask price→size dictionaries - - Supports deletions via zero-size updates - - Provides sorted top-N level extraction for visualization - -#### Metrics Calculation (`metrics_calculator.py`) -- `MetricsCalculator`: Manages OBI and CVD metrics with windowed aggregation - - Tracks CVD from trade flow (buy vs sell volume delta) - - Calculates OBI from orderbook volume imbalance - - Provides throttled updates and OHLC-style metric bars - -#### Level Parsing (`level_parser.py`) -- Utility functions for normalizing orderbook level data: - - `normalize_levels()`: parses levels, filtering zero/negative sizes - - `parse_levels_including_zeros()`: preserves zeros for deletion operations - - Supports JSON and Python literal formats with robust error handling - -### Inter-Process Communication (`viz_io.py`) - -- File paths (relative to project root): - - `ohlc_data.json`: rolling list of OHLC bars (max 1000). - - `depth_data.json`: latest depth snapshot (bids/asks). - - `metrics_data.json`: rolling list of OBI/TOT OHLC bars (max 1000). -- Atomic writes via temp files prevent partial reads by the Dash app. -- API: - - `add_ohlc_bar(...)`: append a new bar; trim to last 1000. - - `upsert_ohlc_bar(...)`: replace last bar if timestamp matches; else append; trim. - - `clear_data()`: reset OHLC data to an empty list. - -### Visualization (`app.py`) - -- Dash application with two graphs plus OBI subplot: - - OHLC + Volume subplot with shared x-axis. - - OBI candlestick subplot (blue tones) sharing x-axis. - - Depth (cumulative) chart for bids and asks. -- Polling interval (500 ms) callback reads JSON files and updates figures resilently: - - Caches last good values to tolerate in-flight writes/decoding errors. - - Builds figures with Plotly dark theme. -- Exposed on `http://localhost:8050` by default (`host=0.0.0.0`). - -### CLI Orchestration (`main.py`) - -- Typer CLI entrypoint: - - Arguments: `instrument`, `start_date`, `end_date` (UTC, `YYYY-MM-DD`), options: `--window-seconds`. - - Discovers SQLite files under `../data/OKX` matching the instrument. - - Launches Dash visualizer as a separate process: `uv run python app.py`. - - Streams databases sequentially: for each book row, processes trades and updates orderbook. - -## Data Flow - -1. Discover and open SQLite database(s) for the requested instrument. -2. Stream `book` rows with one-row lookahead to form time windows. -3. Stream `trades` in timestamp order and bucket into the active window. -4. For each window: - - Aggregate trades into OHLC using `OHLCProcessor.process_trades`. - - Apply partial depth updates via `OHLCProcessor.update_orderbook` and emit periodic snapshots. -5. Persist current OHLC bar(s) and depth snapshots to JSON via atomic writes. -6. Dash app polls JSON and renders charts. - -## IPC JSON Schemas - -- OHLC (`ohlc_data.json`): array of bars; each bar is `[ts, open, high, low, close, volume]`. - -- Depth (`depth_data.json`): object with bids/asks arrays: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}`. - -- Metrics (`metrics_data.json`): array of bars; each bar is `[ts, obi_open, obi_high, obi_low, obi_close, tot_open, tot_high, tot_low, tot_close]`. - -## Configuration - -- `OHLCProcessor(window_seconds, depth_levels_per_side)` controls aggregation granularity and depth snapshot size. -- Visualizer interval (`500 ms`) balances UI responsiveness and CPU usage. -- Paths: JSON files (`ohlc_data.json`, `depth_data.json`) are colocated with the code and written atomically. -- CLI parameters select instrument and time range; databases expected under `../data/OKX`. - -## Performance Characteristics - -- Read-only SQLite tuned for fast sequential scans: immutable URI, query_only, large mmap and cache. -- Batching minimizes cursor churn and Python overhead. -- JSON IPC uses atomic replace to avoid contention; OHLC list is bounded to 1000 entries. -- Processor throttles intra-window OHLC upserts and depth emissions to reduce I/O. - -## Error Handling - -- Visualizer tolerates JSON decode races by reusing last good values and logging warnings. -- Processor guards depth parsing and writes; logs at debug/info levels. -- Visualizer startup is wrapped; if it fails, processing continues without UI. - -## Security Considerations - -- SQLite connections are read-only and immutable; no write queries executed. -- File writes are confined to project directory; no paths derived from untrusted input. -- Logs avoid sensitive data; only operational metadata. - -## Testing Guidance - -- Unit tests (run with `uv run pytest`): - - `OHLCProcessor`: window boundary handling, high/low tracking, volume accumulation, upsert behavior. - - Depth maintenance: deletions (size==0), top-N sorting, throttling. - - `DBInterpreter.stream`: correct trade-window assignment, end-of-stream handling. -- Integration: end-to-end generation of JSON from a tiny fixture DB and basic figure construction without launching a server. - -## Roadmap (Optional Enhancements) - -- Metrics: add OBI/CVD computation and persist metrics to a dedicated table. -- Repository Pattern: extract DB access into a repository module with typed methods. -- Orchestrator: introduce a `Storage` pipeline module coordinating batch processing and persistence. -- Strategy Layer: compute signals/alerts on stored metrics. -- Visualization: add OBI/CVD subplots and richer interactions. - ---- - -This document reflects the current implementation centered on SQLite streaming, JSON-based IPC, and a Dash visualizer, providing a clear foundation for incremental enhancements. diff --git a/docs/decisions/ADR-001-sqlite-database-choice.md b/docs/decisions/ADR-001-sqlite-database-choice.md deleted file mode 100644 index d506204..0000000 --- a/docs/decisions/ADR-001-sqlite-database-choice.md +++ /dev/null @@ -1,122 +0,0 @@ -# ADR-001: SQLite Database Choice - -## Status -Accepted - -## Context -The orderflow backtest system needs to efficiently store and stream large volumes of historical orderbook and trade data. Key requirements include: - -- Fast sequential read access for time-series data -- Minimal setup and maintenance overhead -- Support for concurrent reads from visualization layer -- Ability to handle databases ranging from 100MB to 10GB+ -- No network dependencies for data access - -## Decision -We will use SQLite as the primary database for storing historical orderbook and trade data. - -## Consequences - -### Positive -- **Zero configuration**: No database server setup or administration required -- **Excellent read performance**: Optimized for sequential scans with proper PRAGMA settings -- **Built-in Python support**: No external dependencies or connection libraries needed -- **File portability**: Database files can be easily shared and archived -- **ACID compliance**: Ensures data integrity during writes (for data ingestion) -- **Small footprint**: Minimal memory and storage overhead -- **Fast startup**: No connection pooling or server initialization delays - -### Negative -- **Single writer limitation**: Cannot handle concurrent writes (acceptable for read-only backtest) -- **Limited scalability**: Not suitable for high-concurrency production trading systems -- **No network access**: Cannot query databases remotely (acceptable for local analysis) -- **File locking**: Potential issues with file system sharing (mitigated by read-only access) - -## Implementation Details - -### Schema Design -```sql --- Orderbook snapshots with timestamp windows -CREATE TABLE book ( - id INTEGER PRIMARY KEY, - instrument TEXT, - bids TEXT NOT NULL, -- JSON array of [price, size] pairs - asks TEXT NOT NULL, -- JSON array of [price, size] pairs - timestamp TEXT NOT NULL -); - --- Individual trade records -CREATE TABLE trades ( - id INTEGER PRIMARY KEY, - instrument TEXT, - trade_id TEXT, - price REAL NOT NULL, - size REAL NOT NULL, - side TEXT NOT NULL, -- "buy" or "sell" - timestamp TEXT NOT NULL -); - --- Indexes for efficient time-based queries -CREATE INDEX idx_book_timestamp ON book(timestamp); -CREATE INDEX idx_trades_timestamp ON trades(timestamp); -``` - -### Performance Optimizations -```python -# Read-only connection with optimized PRAGMA settings -connection_uri = f"file:{db_path}?immutable=1&mode=ro" -conn = sqlite3.connect(connection_uri, uri=True) -conn.execute("PRAGMA query_only = 1") -conn.execute("PRAGMA temp_store = MEMORY") -conn.execute("PRAGMA mmap_size = 268435456") # 256MB -conn.execute("PRAGMA cache_size = 10000") -``` - -## Alternatives Considered - -### PostgreSQL -- **Rejected**: Requires server setup and maintenance -- **Pros**: Better concurrent access, richer query features -- **Cons**: Overkill for read-only use case, deployment complexity - -### Parquet Files -- **Rejected**: Limited query capabilities for time-series data -- **Pros**: Excellent compression, columnar format -- **Cons**: No indexes, complex range queries, requires additional libraries - -### MongoDB -- **Rejected**: Document structure not optimal for time-series data -- **Pros**: Flexible schema, good aggregation pipeline -- **Cons**: Requires server, higher memory usage, learning curve - -### CSV Files -- **Rejected**: Poor query performance for large datasets -- **Pros**: Simple format, universal compatibility -- **Cons**: No indexing, slow filtering, type conversion overhead - -### InfluxDB -- **Rejected**: Overkill for historical data analysis -- **Pros**: Optimized for time-series, good compression -- **Cons**: Additional service dependency, learning curve - -## Migration Path -If scalability becomes an issue in the future: - -1. **Phase 1**: Implement database abstraction layer in `db_interpreter` -2. **Phase 2**: Add PostgreSQL adapter for production workloads -3. **Phase 3**: Implement data partitioning for very large datasets -4. **Phase 4**: Consider distributed storage for multi-terabyte datasets - -## Monitoring -Track the following metrics to validate this decision: -- Database file sizes and growth rates -- Query performance for different date ranges -- Memory usage during streaming operations -- Time to process complete backtests - -## Review Date -This decision should be reviewed if: -- Database files consistently exceed 50GB -- Query performance degrades below 1000 rows/second -- Concurrent access requirements change -- Network-based data sharing becomes necessary diff --git a/docs/decisions/ADR-002-json-ipc-communication.md b/docs/decisions/ADR-002-json-ipc-communication.md deleted file mode 100644 index 208856d..0000000 --- a/docs/decisions/ADR-002-json-ipc-communication.md +++ /dev/null @@ -1,162 +0,0 @@ -# ADR-002: JSON File-Based Inter-Process Communication - -## Status -Accepted - -## Context -The orderflow backtest system requires communication between the data processing pipeline and the web-based visualization frontend. Key requirements include: - -- Real-time data updates from processor to visualization -- Tolerance for timing mismatches between writer and reader -- Simple implementation without external dependencies -- Support for different update frequencies (OHLC bars vs. orderbook depth) -- Graceful handling of process crashes or restarts - -## Decision -We will use JSON files with atomic write operations for inter-process communication between the data processor and Dash visualization frontend. - -## Consequences - -### Positive -- **Simplicity**: No message queues, sockets, or complex protocols -- **Fault tolerance**: File-based communication survives process restarts -- **Debugging friendly**: Data files can be inspected manually -- **No dependencies**: Built-in JSON support, no external libraries -- **Atomic operations**: Temp file + rename prevents partial reads -- **Language agnostic**: Any process can read/write JSON files -- **Bounded memory**: Rolling data windows prevent unlimited growth - -### Negative -- **File I/O overhead**: Disk writes may be slower than in-memory communication -- **Polling required**: Reader must poll for updates (500ms interval) -- **Limited throughput**: Not suitable for high-frequency (microsecond) updates -- **No acknowledgments**: Writer cannot confirm reader has processed data -- **File system dependency**: Performance varies by storage type - -## Implementation Details - -### File Structure -``` -ohlc_data.json # Rolling array of OHLC bars (max 1000) -depth_data.json # Current orderbook depth snapshot -metrics_data.json # Rolling array of OBI/CVD metrics (max 1000) -``` - -### Atomic Write Pattern -```python -def atomic_write(file_path: Path, data: Any) -> None: - """Write data atomically to prevent partial reads.""" - temp_path = file_path.with_suffix('.tmp') - with open(temp_path, 'w') as f: - json.dump(data, f) - f.flush() - os.fsync(f.fileno()) - temp_path.replace(file_path) # Atomic on POSIX systems -``` - -### Data Formats -```python -# OHLC format: [timestamp_ms, open, high, low, close, volume] -ohlc_data = [ - [1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5], - [1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3] -] - -# Depth format: top-N levels per side -depth_data = { - "bids": [[49990.0, 1.5], [49985.0, 2.1]], - "asks": [[50010.0, 1.2], [50015.0, 1.8]] -} - -# Metrics format: [timestamp_ms, obi_open, obi_high, obi_low, obi_close] -metrics_data = [ - [1640995200000, 0.15, 0.22, 0.08, 0.18], - [1640995260000, 0.18, 0.25, 0.12, 0.20] -] -``` - -### Error Handling -```python -# Reader pattern with graceful fallback -try: - with open(data_file) as f: - new_data = json.load(f) - _LAST_DATA = new_data # Cache successful read -except (FileNotFoundError, json.JSONDecodeError) as e: - logging.warning(f"Using cached data: {e}") - new_data = _LAST_DATA # Use cached data -``` - -## Performance Characteristics - -### Write Performance -- **Small files**: < 1MB typical, writes complete in < 10ms -- **Atomic operations**: Add ~2-5ms overhead for temp file creation -- **Throttling**: Updates limited to prevent excessive I/O - -### Read Performance -- **Parse time**: < 5ms for typical JSON file sizes -- **Polling overhead**: 500ms interval balances responsiveness and CPU usage -- **Error recovery**: Cached data eliminates visual glitches - -### Memory Usage -- **Bounded datasets**: Max 1000 bars × 6 fields × 8 bytes = ~48KB per file -- **JSON overhead**: ~2x memory during parsing -- **Total footprint**: < 500KB for all IPC data - -## Alternatives Considered - -### Redis Pub/Sub -- **Rejected**: Additional service dependency, overkill for simple use case -- **Pros**: True real-time updates, built-in data structures -- **Cons**: External dependency, memory overhead, configuration complexity - -### ZeroMQ -- **Rejected**: Additional library dependency, more complex than needed -- **Pros**: High performance, flexible patterns -- **Cons**: Learning curve, binary dependency, networking complexity - -### Named Pipes/Unix Sockets -- **Rejected**: Platform-specific, more complex error handling -- **Pros**: Better performance, no file I/O -- **Cons**: Platform limitations, harder debugging, process lifetime coupling - -### SQLite as Message Queue -- **Rejected**: Overkill for simple data exchange -- **Pros**: ACID transactions, complex queries possible -- **Cons**: Schema management, locking considerations, overhead - -### HTTP API -- **Rejected**: Too much overhead for local communication -- **Pros**: Standard protocol, language agnostic -- **Cons**: Network stack overhead, port management, authentication - -## Future Considerations - -### Scalability Limits -Current approach suitable for: -- Update frequencies: 1-10 Hz -- Data volumes: < 10MB total -- Process counts: 1 writer, few readers - -### Migration Path -If performance becomes insufficient: -1. **Phase 1**: Add compression (gzip) to reduce I/O -2. **Phase 2**: Implement shared memory for high-frequency data -3. **Phase 3**: Consider message queue for complex routing -4. **Phase 4**: Migrate to streaming protocol for real-time requirements - -## Monitoring -Track these metrics to validate the approach: -- File write latency and frequency -- JSON parse times in visualization -- Error rates for partial reads -- Memory usage growth over time - -## Review Triggers -Reconsider this decision if: -- Update frequency requirements exceed 10 Hz -- File I/O becomes a performance bottleneck -- Multiple visualization clients need the same data -- Complex message routing becomes necessary -- Platform portability becomes a concern diff --git a/docs/decisions/ADR-003-dash-visualization-framework.md b/docs/decisions/ADR-003-dash-visualization-framework.md deleted file mode 100644 index 3e801cd..0000000 --- a/docs/decisions/ADR-003-dash-visualization-framework.md +++ /dev/null @@ -1,204 +0,0 @@ -# ADR-003: Dash Web Framework for Visualization - -## Status -Accepted - -## Context -The orderflow backtest system requires a user interface for visualizing OHLC candlestick charts, volume data, orderbook depth, and derived metrics. Key requirements include: - -- Real-time chart updates with minimal latency -- Professional financial data visualization capabilities -- Support for multiple chart types (candlesticks, bars, line charts) -- Interactive features (zooming, panning, hover details) -- Dark theme suitable for trading applications -- Python-native solution to avoid JavaScript development - -## Decision -We will use Dash (by Plotly) as the web framework for building the visualization frontend, with Plotly.js for chart rendering. - -## Consequences - -### Positive -- **Python-native**: No JavaScript development required -- **Plotly integration**: Best-in-class financial charting capabilities -- **Reactive architecture**: Automatic UI updates via callback system -- **Professional appearance**: High-quality charts suitable for trading applications -- **Interactive features**: Built-in zooming, panning, hover tooltips -- **Responsive design**: Bootstrap integration for modern layouts -- **Development speed**: Rapid prototyping and iteration -- **WebGL acceleration**: Smooth performance for large datasets - -### Negative -- **Performance overhead**: Heavier than custom JavaScript solutions -- **Limited customization**: Constrained by Dash component ecosystem -- **Single-page limitation**: Not suitable for complex multi-page applications -- **Memory usage**: Can be heavy for resource-constrained environments -- **Learning curve**: Callback patterns require understanding of reactive programming - -## Implementation Details - -### Application Structure -```python -# Main application with Bootstrap theme -app = dash.Dash(__name__, external_stylesheets=[dbc.themes.FLATLY]) - -# Responsive layout with 9:3 ratio for charts:depth -app.layout = dbc.Container([ - dbc.Row([ - dbc.Col([ # OHLC + Volume + Metrics - dcc.Graph(id='ohlc-chart', style={'height': '100vh'}) - ], width=9), - dbc.Col([ # Orderbook Depth - dcc.Graph(id='depth-chart', style={'height': '100vh'}) - ], width=3) - ]), - dcc.Interval(id='interval-update', interval=500, n_intervals=0) -]) -``` - -### Chart Architecture -```python -# Multi-subplot chart with shared x-axis -fig = make_subplots( - rows=3, cols=1, - row_heights=[0.6, 0.2, 0.2], # OHLC, Volume, Metrics - vertical_spacing=0.02, - shared_xaxes=True, - subplot_titles=['Price', 'Volume', 'OBI Metrics'] -) - -# Candlestick chart with dark theme -fig.add_trace(go.Candlestick( - x=timestamps, open=opens, high=highs, low=lows, close=closes, - increasing_line_color='#00ff00', decreasing_line_color='#ff0000' -), row=1, col=1) -``` - -### Real-time Updates -```python -@app.callback( - [Output('ohlc-chart', 'figure'), Output('depth-chart', 'figure')], - [Input('interval-update', 'n_intervals')] -) -def update_charts(n_intervals): - # Read data from JSON files with error handling - # Build and return updated figures - return ohlc_fig, depth_fig -``` - -## Performance Characteristics - -### Update Latency -- **Polling interval**: 500ms for near real-time updates -- **Chart render time**: 50-200ms depending on data size -- **Memory usage**: ~100MB for typical chart configurations -- **Browser requirements**: Modern browser with WebGL support - -### Scalability Limits -- **Data points**: Up to 10,000 candlesticks without performance issues -- **Update frequency**: Optimal at 1-2 Hz, maximum ~10 Hz -- **Concurrent users**: Single user design (development server) -- **Memory growth**: Linear with data history size - -## Alternatives Considered - -### Streamlit -- **Rejected**: Less interactive, slower updates, limited charting -- **Pros**: Simpler programming model, good for prototypes -- **Cons**: Poor real-time performance, limited financial chart types - -### Flask + Custom JavaScript -- **Rejected**: Requires JavaScript development, more complex -- **Pros**: Complete control, potentially better performance -- **Cons**: Significant development overhead, maintenance burden - -### Jupyter Notebooks -- **Rejected**: Not suitable for production deployment -- **Pros**: Great for exploration and analysis -- **Cons**: No real-time updates, not web-deployable - -### Bokeh -- **Rejected**: Less mature ecosystem, fewer financial chart types -- **Pros**: Good performance, Python-native -- **Cons**: Smaller community, limited examples for financial data - -### Custom React Application -- **Rejected**: Requires separate frontend team, complex deployment -- **Pros**: Maximum flexibility, best performance potential -- **Cons**: High development cost, maintenance overhead - -### Desktop GUI (Tkinter/PyQt) -- **Rejected**: Not web-accessible, limited styling options -- **Pros**: No browser dependency, good performance -- **Cons**: Deployment complexity, poor mobile support - -## Configuration Options - -### Theme and Styling -```python -# Dark theme configuration -dark_theme = { - 'plot_bgcolor': '#000000', - 'paper_bgcolor': '#000000', - 'font_color': '#ffffff', - 'grid_color': '#333333' -} -``` - -### Chart Types -- **Candlestick charts**: OHLC price data with volume -- **Bar charts**: Volume and metrics visualization -- **Line charts**: Cumulative depth and trend analysis -- **Scatter plots**: Trade-by-trade analysis (future) - -### Interactive Features -- **Zoom and pan**: Time-based navigation -- **Hover tooltips**: Detailed data on mouse over -- **Crosshairs**: Precise value reading -- **Range selector**: Quick time period selection - -## Future Enhancements - -### Short-term (1-3 months) -- Add range selector for time navigation -- Implement chart annotation for significant events -- Add export functionality for charts and data - -### Medium-term (3-6 months) -- Multi-instrument support with tabs -- Advanced indicators and overlays -- User preference persistence - -### Long-term (6+ months) -- Real-time alerts and notifications -- Strategy backtesting visualization -- Portfolio-level analytics - -## Monitoring and Metrics - -### Performance Monitoring -- Chart render times and update frequencies -- Memory usage growth over time -- Browser compatibility and error rates -- User interaction patterns - -### Quality Metrics -- Chart accuracy compared to source data -- Visual responsiveness during heavy updates -- Error recovery from data corruption - -## Review Triggers -Reconsider this decision if: -- Update frequency requirements exceed 10 Hz consistently -- Memory usage becomes prohibitive (> 1GB) -- Custom visualization requirements cannot be met -- Multi-user deployment becomes necessary -- Mobile responsiveness becomes a priority -- Integration with external charting libraries is needed - -## Migration Path -If replacement becomes necessary: -1. **Phase 1**: Abstract chart building logic from Dash specifics -2. **Phase 2**: Implement alternative frontend while maintaining data formats -3. **Phase 3**: A/B test performance and usability -4. **Phase 4**: Complete migration with feature parity diff --git a/docs/modules/app.md b/docs/modules/app.md deleted file mode 100644 index b947e08..0000000 --- a/docs/modules/app.md +++ /dev/null @@ -1,165 +0,0 @@ -# Module: app - -## Purpose -The `app` module provides a real-time Dash web application for visualizing OHLC candlestick charts, volume data, Order Book Imbalance (OBI) metrics, and orderbook depth. It implements a polling-based architecture that reads JSON data files and renders interactive charts with a dark theme. - -## Public Interface - -### Functions -- `build_empty_ohlc_fig() -> go.Figure`: Create empty OHLC chart with proper styling -- `build_empty_depth_fig() -> go.Figure`: Create empty depth chart with proper styling -- `build_ohlc_fig(data: List[list], metrics: List[list]) -> go.Figure`: Build complete OHLC+Volume+OBI chart -- `build_depth_fig(depth_data: dict) -> go.Figure`: Build orderbook depth visualization - -### Global Variables -- `_LAST_DATA`: Cached OHLC data for error recovery -- `_LAST_DEPTH`: Cached depth data for error recovery -- `_LAST_METRICS`: Cached metrics data for error recovery - -### Dash Application -- `app`: Main Dash application instance with Bootstrap theme -- Layout with responsive grid (9:3 ratio for OHLC:Depth charts) -- 500ms polling interval for real-time updates - -## Usage Examples - -### Running the Application -```bash -# Start the Dash server -uv run python app.py - -# Access the web interface -# Open http://localhost:8050 in your browser -``` - -### Programmatic Usage -```python -from app import build_ohlc_fig, build_depth_fig - -# Build charts with sample data -ohlc_data = [[1640995200000, 50000, 50100, 49900, 50050, 125.5]] -metrics_data = [[1640995200000, 0.15, 0.22, 0.08, 0.18]] -depth_data = { - "bids": [[49990, 1.5], [49985, 2.1]], - "asks": [[50010, 1.2], [50015, 1.8]] -} - -ohlc_fig = build_ohlc_fig(ohlc_data, metrics_data) -depth_fig = build_depth_fig(depth_data) -``` - -## Dependencies - -### Internal -- `viz_io`: Data file paths and JSON reading -- `viz_io.DATA_FILE`: OHLC data source -- `viz_io.DEPTH_FILE`: Depth data source -- `viz_io.METRICS_FILE`: Metrics data source - -### External -- `dash`: Web application framework -- `dash.html`, `dash.dcc`: HTML and core components -- `dash_bootstrap_components`: Bootstrap styling -- `plotly.graph_objs`: Chart objects -- `plotly.subplots`: Multiple subplot support -- `pandas`: Data manipulation (minimal usage) -- `json`: JSON file parsing -- `logging`: Error and debug logging -- `pathlib`: File path handling - -## Chart Architecture - -### OHLC Chart (Left Panel, 9/12 width) -- **Main subplot**: Candlestick chart with OHLC data -- **Volume subplot**: Bar chart sharing x-axis with main chart -- **OBI subplot**: Order Book Imbalance candlestick chart in blue tones -- **Shared x-axis**: Synchronized zooming and panning across subplots - -### Depth Chart (Right Panel, 3/12 width) -- **Cumulative depth**: Stepped line chart showing bid/ask liquidity -- **Color coding**: Green for bids, red for asks -- **Real-time updates**: Reflects current orderbook state - -## Styling and Theme - -### Dark Theme Configuration -- Background: Black (`#000000`) -- Text: White (`#ffffff`) -- Grid: Dark gray with transparency -- Candlesticks: Green (up) / Red (down) -- Volume: Gray bars -- OBI: Blue tones for candlesticks -- Depth: Green (bids) / Red (asks) - -### Responsive Design -- Bootstrap grid system for layout -- Fluid container for full-width usage -- 100vh height for full viewport coverage -- Configurable chart display modes - -## Data Polling and Error Handling - -### Polling Strategy -- **Interval**: 500ms for near real-time updates -- **Graceful degradation**: Uses cached data on JSON read errors -- **Atomic reads**: Tolerates partial writes during file updates -- **Logging**: Warnings for data inconsistencies - -### Error Recovery -```python -# Pseudocode for error handling pattern -try: - with open(data_file) as f: - new_data = json.load(f) - _LAST_DATA = new_data # Cache successful read -except (FileNotFoundError, json.JSONDecodeError): - logging.warning("Using cached data due to read error") - new_data = _LAST_DATA # Use cached data -``` - -## Performance Characteristics - -- **Client-side rendering**: Plotly.js handles chart rendering -- **Efficient updates**: Only redraws when data changes -- **Memory bounded**: Limited by max bars in data files (1000) -- **Network efficient**: Local file polling (no external API calls) - -## Testing - -Run application tests: -```bash -uv run pytest test_app.py -v -``` - -Test coverage includes: -- Chart building functions -- Data loading and caching -- Error handling scenarios -- Layout rendering -- Callback functionality - -## Configuration Options - -### Server Configuration -- **Host**: `0.0.0.0` (accessible from network) -- **Port**: `8050` (default Dash port) -- **Debug mode**: Disabled in production - -### Chart Configuration -- **Update interval**: 500ms (configurable via dcc.Interval) -- **Display mode bar**: Enabled for user interaction -- **Logo display**: Disabled for clean interface - -## Known Issues - -- High CPU usage during rapid data updates -- Memory usage grows with chart history -- No authentication or access control -- Limited mobile responsiveness for complex charts - -## Development Notes - -- Uses Flask development server (not suitable for production) -- Callback exceptions suppressed for partial data scenarios -- Bootstrap CSS loaded from CDN -- Chart configurations optimized for financial data visualization diff --git a/docs/modules/db_interpreter.md b/docs/modules/db_interpreter.md deleted file mode 100644 index 84035da..0000000 --- a/docs/modules/db_interpreter.md +++ /dev/null @@ -1,83 +0,0 @@ -# Module: db_interpreter - -## Purpose -The `db_interpreter` module provides efficient streaming access to SQLite databases containing orderbook and trade data. It handles batch reading, temporal windowing, and data structure normalization for downstream processing. - -## Public Interface - -### Classes -- `OrderbookLevel(price: float, size: float)`: Dataclass representing a single price level in the orderbook -- `OrderbookUpdate`: Container for windowed orderbook data with bids, asks, timestamp, and end_timestamp - -### Functions -- `DBInterpreter(db_path: Path)`: Constructor that initializes read-only SQLite connection with optimized PRAGMA settings - -### Methods -- `stream() -> Iterator[tuple[OrderbookUpdate, list[tuple]]]`: Primary streaming interface that yields orderbook updates with associated trades in temporal windows - -## Usage Examples - -```python -from pathlib import Path -from db_interpreter import DBInterpreter - -# Initialize interpreter -db_path = Path("data/BTC-USDT-2025-01-01.db") -interpreter = DBInterpreter(db_path) - -# Stream orderbook and trade data -for ob_update, trades in interpreter.stream(): - # Process orderbook update - print(f"Book update: {len(ob_update.bids)} bids, {len(ob_update.asks)} asks") - print(f"Time window: {ob_update.timestamp} - {ob_update.end_timestamp}") - - # Process trades in this window - for trade in trades: - trade_id, price, size, side, timestamp_ms = trade[1:6] - print(f"Trade: {side} {size} @ {price}") -``` - -## Dependencies - -### Internal -- None (standalone module) - -### External -- `sqlite3`: Database connectivity -- `pathlib`: Path handling -- `dataclasses`: Data structure definitions -- `typing`: Type annotations -- `logging`: Debug and error logging - -## Performance Characteristics - -- **Batch sizes**: BOOK_BATCH=2048, TRADE_BATCH=4096 for optimal memory usage -- **SQLite optimizations**: Read-only, immutable mode, large mmap and cache sizes -- **Memory efficient**: Streaming iterator pattern prevents loading entire dataset -- **Temporal windowing**: One-row lookahead for precise time boundary calculation - -## Testing - -Run module tests: -```bash -uv run pytest test_db_interpreter.py -v -``` - -Test coverage includes: -- Batch reading correctness -- Temporal window boundary handling -- Trade-to-window assignment accuracy -- End-of-stream behavior -- Error handling for malformed data - -## Known Issues - -- Requires specific database schema (book and trades tables) -- Python-literal string parsing assumes well-formed input -- Large databases may require memory monitoring during streaming - -## Configuration - -- `BOOK_BATCH`: Number of orderbook rows to fetch per query (default: 2048) -- `TRADE_BATCH`: Number of trade rows to fetch per query (default: 4096) -- SQLite PRAGMA settings optimized for read-only sequential access diff --git a/docs/modules/dependencies.md b/docs/modules/dependencies.md deleted file mode 100644 index a07929e..0000000 --- a/docs/modules/dependencies.md +++ /dev/null @@ -1,162 +0,0 @@ -# External Dependencies - -## Overview -This document describes all external dependencies used in the orderflow backtest system, their purposes, versions, and justifications for inclusion. - -## Production Dependencies - -### Core Framework Dependencies - -#### Dash (^2.18.2) -- **Purpose**: Web application framework for interactive visualizations -- **Usage**: Real-time chart rendering and user interface -- **Justification**: Mature Python-based framework with excellent Plotly integration -- **Key Features**: Reactive components, built-in server, callback system - -#### Dash Bootstrap Components (^1.6.0) -- **Purpose**: Bootstrap CSS framework integration for Dash -- **Usage**: Responsive layout grid and modern UI styling -- **Justification**: Provides professional appearance with minimal custom CSS - -#### Plotly (^5.24.1) -- **Purpose**: Interactive charting and visualization library -- **Usage**: OHLC candlesticks, volume bars, depth charts, OBI metrics -- **Justification**: Industry standard for financial data visualization -- **Key Features**: WebGL acceleration, zooming/panning, dark themes - -### Data Processing Dependencies - -#### Pandas (^2.2.3) -- **Purpose**: Data manipulation and analysis library -- **Usage**: Minimal usage for data structure conversions in visualization -- **Justification**: Standard tool for financial data handling -- **Note**: Usage kept minimal to maintain performance - -#### Typer (^0.13.1) -- **Purpose**: Modern CLI framework -- **Usage**: Command-line argument parsing and help generation -- **Justification**: Type-safe, auto-generated help, better UX than argparse -- **Key Features**: Type hints integration, automatic validation - -### Data Storage Dependencies - -#### SQLite3 (Built-in) -- **Purpose**: Database connectivity for historical data -- **Usage**: Read-only access to orderbook and trade data -- **Justification**: Built into Python, no external dependencies, excellent performance -- **Configuration**: Optimized with immutable mode and mmap - -## Development and Testing Dependencies - -#### Pytest (^8.3.4) -- **Purpose**: Testing framework -- **Usage**: Unit tests, integration tests, test discovery -- **Justification**: Standard Python testing tool with excellent plugin ecosystem - -#### Coverage (^7.6.9) -- **Purpose**: Code coverage measurement -- **Usage**: Test coverage reporting and quality metrics -- **Justification**: Essential for maintaining code quality - -## Build and Package Management - -#### UV (Package Manager) -- **Purpose**: Fast Python package manager and task runner -- **Usage**: Dependency management, virtual environments, script execution -- **Justification**: Significantly faster than pip/poetry, better lock file format -- **Commands**: `uv sync`, `uv run`, `uv add` - -## Python Standard Library Usage - -### Core Libraries -- **sqlite3**: Database connectivity -- **json**: JSON serialization for IPC -- **pathlib**: Modern file path handling -- **subprocess**: Process management for visualization -- **logging**: Structured logging throughout application -- **datetime**: Date/time parsing and manipulation -- **dataclasses**: Structured data types -- **typing**: Type annotations and hints -- **tempfile**: Atomic file operations -- **ast**: Safe evaluation of Python literals - -### Performance Libraries -- **itertools**: Efficient iteration patterns -- **functools**: Function decoration and caching -- **collections**: Specialized data structures - -## Dependency Justifications - -### Why Dash Over Alternatives? -- **vs. Streamlit**: Better real-time updates, more control over layout -- **vs. Flask + Custom JS**: Integrated Plotly support, faster development -- **vs. Jupyter**: Better for production deployment, process isolation - -### Why SQLite Over Alternatives? -- **vs. PostgreSQL**: No server setup required, excellent read performance -- **vs. Parquet**: Better for time-series queries, built-in indexing -- **vs. CSV**: Proper data types, much faster queries, atomic transactions - -### Why UV Over Poetry/Pip? -- **vs. Poetry**: Significantly faster dependency resolution and installation -- **vs. Pip**: Better dependency locking, integrated task runner -- **vs. Pipenv**: More active development, better performance - -## Version Pinning Strategy - -### Patch Version Pinning -- Core dependencies (Dash, Plotly) pinned to patch versions -- Prevents breaking changes while allowing security updates - -### Range Pinning -- Development tools use caret (^) ranges for flexibility -- Testing tools can update more freely - -### Lock File Management -- `uv.lock` ensures reproducible builds across environments -- Regular updates scheduled monthly for security patches - -## Security Considerations - -### Dependency Scanning -- Regular audit of dependencies for known vulnerabilities -- Automated updates for security patches -- Minimal dependency tree to reduce attack surface - -### Data Isolation -- Read-only database access prevents data modification -- No external network connections required for core functionality -- All file operations contained within project directory - -## Performance Impact - -### Bundle Size -- Core runtime: ~50MB with all dependencies -- Dash frontend: Additional ~10MB for JavaScript assets -- SQLite: Zero overhead (built-in) - -### Startup Time -- Cold start: ~2-3 seconds for full application -- UV virtual environment activation: ~100ms -- Database connection: ~50ms per file - -### Memory Usage -- Base application: ~100MB -- Per 1000 OHLC bars: ~5MB additional -- Plotly charts: ~20MB for complex visualizations - -## Maintenance Schedule - -### Monthly -- Security update review and application -- Dependency version bump evaluation - -### Quarterly -- Major version update consideration -- Performance impact assessment -- Alternative technology evaluation - -### Annually -- Complete dependency audit -- Technology stack review -- Migration planning for deprecated packages diff --git a/docs/modules/level_parser.md b/docs/modules/level_parser.md deleted file mode 100644 index c49c1a2..0000000 --- a/docs/modules/level_parser.md +++ /dev/null @@ -1,101 +0,0 @@ -# Module: level_parser - -## Purpose -The `level_parser` module provides utilities for parsing and normalizing orderbook level data from various string formats. It handles JSON and Python literal representations, converting them into standardized numeric tuples for processing. - -## Public Interface - -### Functions -- `normalize_levels(levels: Any) -> List[List[float]]`: Parse levels into [[price, size], ...] format, filtering out zero/negative sizes -- `parse_levels_including_zeros(levels: Any) -> List[Tuple[float, float]]`: Parse levels preserving zero sizes for deletion operations - -### Private Functions -- `_parse_string_to_list(levels: Any) -> List[Any]`: Core parsing logic trying JSON first, then literal_eval -- `_extract_price_size(item: Any) -> Tuple[Any, Any]`: Extract price/size from dict or list/tuple formats - -## Usage Examples - -```python -from level_parser import normalize_levels, parse_levels_including_zeros - -# Parse standard levels (filters zeros) -levels = normalize_levels('[[50000.0, 1.5], [49999.0, 2.0]]') -# Returns: [[50000.0, 1.5], [49999.0, 2.0]] - -# Parse with zero sizes preserved (for deletions) -updates = parse_levels_including_zeros('[[50000.0, 0.0], [49999.0, 1.5]]') -# Returns: [(50000.0, 0.0), (49999.0, 1.5)] - -# Supports dict format -dict_levels = normalize_levels('[{"price": 50000.0, "size": 1.5}]') -# Returns: [[50000.0, 1.5]] - -# Short key format -short_levels = normalize_levels('[{"p": 50000.0, "s": 1.5}]') -# Returns: [[50000.0, 1.5]] -``` - -## Dependencies - -### External -- `json`: Primary parsing method for level data -- `ast.literal_eval`: Fallback parsing for Python literal formats -- `logging`: Debug logging for parsing issues -- `typing`: Type annotations - -## Input Formats Supported - -### JSON Array Format -```json -[[50000.0, 1.5], [49999.0, 2.0]] -``` - -### Dict Format (Full Keys) -```json -[{"price": 50000.0, "size": 1.5}, {"price": 49999.0, "size": 2.0}] -``` - -### Dict Format (Short Keys) -```json -[{"p": 50000.0, "s": 1.5}, {"p": 49999.0, "s": 2.0}] -``` - -### Python Literal Format -```python -"[(50000.0, 1.5), (49999.0, 2.0)]" -``` - -## Error Handling - -- **Graceful Degradation**: Returns empty list on parse failures -- **Data Validation**: Filters out invalid price/size pairs -- **Type Safety**: Converts all values to float before processing -- **Debug Logging**: Logs warnings for malformed input without crashing - -## Performance Characteristics - -- **Fast Path**: JSON parsing prioritized for performance -- **Fallback Support**: ast.literal_eval as backup for edge cases -- **Memory Efficient**: Processes items iteratively, not loading entire dataset -- **Validation**: Minimal overhead with early filtering of invalid data - -## Testing - -```bash -uv run pytest test_level_parser.py -v -``` - -Test coverage includes: -- JSON format parsing accuracy -- Dict format (both key styles) parsing -- Python literal fallback parsing -- Zero size preservation vs filtering -- Error handling for malformed input -- Type conversion edge cases - -## Known Limitations - -- Assumes well-formed numeric data (price/size as numbers) -- Does not validate economic constraints (e.g., positive prices) -- Limited to list/dict input formats -- No support for streaming/incremental parsing diff --git a/docs/modules/main.md b/docs/modules/main.md deleted file mode 100644 index 8b92e39..0000000 --- a/docs/modules/main.md +++ /dev/null @@ -1,168 +0,0 @@ -# Module: main - -## Purpose -The `main` module provides the command-line interface (CLI) orchestration for the orderflow backtest system. It handles database discovery, process management, and coordinates the streaming pipeline with the visualization frontend using Typer for argument parsing. - -## Public Interface - -### Functions -- `main(instrument: str, start_date: str, end_date: str, window_seconds: int = 60) -> None`: Primary CLI entrypoint -- `discover_databases(instrument: str, start_date: str, end_date: str) -> list[Path]`: Find matching database files -- `launch_visualizer() -> subprocess.Popen | None`: Start Dash application in separate process - -### CLI Arguments -- `instrument`: Trading pair identifier (e.g., "BTC-USDT") -- `start_date`: Start date in YYYY-MM-DD format (UTC) -- `end_date`: End date in YYYY-MM-DD format (UTC) -- `--window-seconds`: OHLC aggregation window size (default: 60) - -## Usage Examples - -### Command Line Usage -```bash -# Basic usage with default 60-second windows -uv run python main.py BTC-USDT 2025-01-01 2025-01-31 - -# Custom window size -uv run python main.py ETH-USDT 2025-02-01 2025-02-28 --window-seconds 30 - -# Single day processing -uv run python main.py SOL-USDT 2025-03-15 2025-03-15 -``` - -### Programmatic Usage -```python -from main import main, discover_databases - -# Run processing pipeline -main("BTC-USDT", "2025-01-01", "2025-01-31", window_seconds=120) - -# Discover available databases -db_files = discover_databases("ETH-USDT", "2025-02-01", "2025-02-28") -print(f"Found {len(db_files)} database files") -``` - -## Dependencies - -### Internal -- `db_interpreter.DBInterpreter`: Database streaming -- `ohlc_processor.OHLCProcessor`: Trade aggregation and orderbook processing -- `viz_io`: Data clearing functions - -### External -- `typer`: CLI framework and argument parsing -- `subprocess`: Process management for visualization -- `pathlib`: File and directory operations -- `datetime`: Date parsing and validation -- `logging`: Operational logging -- `sys`: Exit code management - -## Database Discovery Logic - -### File Pattern Matching -```python -# Expected directory structure -../data/OKX/{instrument}/{date}/ - -# Example paths -../data/OKX/BTC-USDT/2025-01-01/trades.db -../data/OKX/ETH-USDT/2025-02-15/trades.db -``` - -### Discovery Algorithm -1. Parse start and end dates to datetime objects -2. Iterate through date range (inclusive) -3. Construct expected path for each date -4. Verify file existence and readability -5. Return sorted list of valid database paths - -## Process Orchestration - -### Visualization Process Management -```python -# Launch Dash app in separate process -viz_process = subprocess.Popen([ - "uv", "run", "python", "app.py" -], cwd=project_root) - -# Process management -try: - # Main processing loop - process_databases(db_files) -finally: - # Cleanup visualization process - if viz_process: - viz_process.terminate() - viz_process.wait(timeout=5) -``` - -### Data Processing Pipeline -1. **Initialize**: Clear existing data files -2. **Launch**: Start visualization process -3. **Stream**: Process each database sequentially -4. **Aggregate**: Generate OHLC bars and depth snapshots -5. **Cleanup**: Terminate visualization and finalize - -## Error Handling - -### Database Access Errors -- **File not found**: Log warning and skip missing databases -- **Permission denied**: Log error and exit with status code 1 -- **Corruption**: Log error for specific database and continue with next - -### Process Management Errors -- **Visualization startup failure**: Log error but continue processing -- **Process termination**: Graceful shutdown with timeout -- **Resource cleanup**: Ensure child processes are terminated - -### Date Validation -- **Invalid format**: Clear error message with expected format -- **Invalid range**: End date must be >= start date -- **Future dates**: Warning for dates beyond data availability - -## Performance Characteristics - -- **Sequential processing**: Databases processed one at a time -- **Memory efficient**: Streaming approach prevents loading entire datasets -- **Process isolation**: Visualization runs independently -- **Resource cleanup**: Automatic process termination on exit - -## Testing - -Run module tests: -```bash -uv run pytest test_main.py -v -``` - -Test coverage includes: -- Database discovery logic -- Date parsing and validation -- Process management -- Error handling scenarios -- CLI argument validation - -## Configuration - -### Default Settings -- **Data directory**: `../data/OKX` (relative to project root) -- **Visualization command**: `uv run python app.py` -- **Window size**: 60 seconds -- **Process timeout**: 5 seconds for termination - -### Environment Variables -- **DATA_PATH**: Override default data directory -- **VISUALIZATION_PORT**: Override Dash port (requires app.py modification) - -## Known Issues - -- Assumes specific directory structure under `../data/OKX` -- No validation of database schema compatibility -- Limited error recovery for process management -- No progress indication for large datasets - -## Development Notes - -- Uses Typer for modern CLI interface -- Subprocess management compatible with Unix/Windows -- Logging configured for both development and production use -- Exit codes follow Unix conventions (0=success, 1=error) diff --git a/docs/modules/metrics_calculator.md b/docs/modules/metrics_calculator.md deleted file mode 100644 index 632a30f..0000000 --- a/docs/modules/metrics_calculator.md +++ /dev/null @@ -1,147 +0,0 @@ -# Module: metrics_calculator - -## Purpose -The `metrics_calculator` module handles calculation and management of trading metrics including Order Book Imbalance (OBI) and Cumulative Volume Delta (CVD). It provides windowed aggregation with throttled updates for real-time visualization. - -## Public Interface - -### Classes -- `MetricsCalculator(window_seconds: int = 60, emit_every_n_updates: int = 25)`: Main metrics calculation engine - -### Methods -- `update_cvd_from_trade(side: str, size: float) -> None`: Update CVD from individual trade data -- `update_obi_metrics(timestamp: str, total_bids: float, total_asks: float) -> None`: Update OBI metrics from orderbook volumes -- `finalize_metrics() -> None`: Emit final metrics bar at processing end - -### Properties -- `cvd_cumulative: float`: Current cumulative volume delta value - -### Private Methods -- `_emit_metrics_bar() -> None`: Emit current metrics to visualization layer - -## Usage Examples - -```python -from metrics_calculator import MetricsCalculator - -# Initialize calculator -calc = MetricsCalculator(window_seconds=60, emit_every_n_updates=25) - -# Update CVD from trades -calc.update_cvd_from_trade("buy", 1.5) # +1.5 CVD -calc.update_cvd_from_trade("sell", 1.0) # -1.0 CVD, net +0.5 - -# Update OBI from orderbook -total_bids, total_asks = 150.0, 120.0 -calc.update_obi_metrics("1640995200000", total_bids, total_asks) - -# Access current CVD -current_cvd = calc.cvd_cumulative # 0.5 - -# Finalize at end of processing -calc.finalize_metrics() -``` - -## Metrics Definitions - -### Cumulative Volume Delta (CVD) -- **Formula**: CVD = Σ(buy_volume - sell_volume) -- **Interpretation**: Positive = more buying pressure, Negative = more selling pressure -- **Accumulation**: Running total across all processed trades -- **Update Frequency**: Every trade - -### Order Book Imbalance (OBI) -- **Formula**: OBI = total_bid_volume - total_ask_volume -- **Interpretation**: Positive = more bid liquidity, Negative = more ask liquidity -- **Aggregation**: OHLC-style bars per time window (open, high, low, close) -- **Update Frequency**: Throttled per orderbook update - -## Dependencies - -### Internal -- `viz_io.upsert_metric_bar`: Output interface for visualization - -### External -- `logging`: Warning messages for unknown trade sides -- `typing`: Type annotations - -## Windowed Aggregation - -### OBI Windows -- **Window Size**: Configurable via `window_seconds` (default: 60) -- **Window Alignment**: Aligned to epoch time boundaries -- **OHLC Tracking**: Maintains open, high, low, close values per window -- **Rollover**: Automatic window transitions with final bar emission - -### Throttling Mechanism -- **Purpose**: Reduce I/O overhead during high-frequency updates -- **Trigger**: Every N updates (configurable via `emit_every_n_updates`) -- **Behavior**: Emits intermediate updates for real-time visualization -- **Final Emission**: Guaranteed on window rollover and finalization - -## State Management - -### CVD State -- `cvd_cumulative: float`: Running total across all trades -- **Persistence**: Maintained throughout processor lifetime -- **Updates**: Incremental addition/subtraction per trade - -### OBI State -- `metrics_window_start: int`: Current window start timestamp -- `metrics_bar: dict`: Current OBI OHLC values -- `_metrics_since_last_emit: int`: Throttling counter - -## Output Format - -### Metrics Bar Structure -```python -{ - 'obi_open': float, # First OBI value in window - 'obi_high': float, # Maximum OBI in window - 'obi_low': float, # Minimum OBI in window - 'obi_close': float, # Latest OBI value -} -``` - -### Visualization Integration -- Emitted via `viz_io.upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close, cvd_value)` -- Compatible with existing OHLC visualization infrastructure -- Real-time updates during active processing - -## Performance Characteristics - -- **Low Memory**: Maintains only current window state -- **Throttled I/O**: Configurable update frequency prevents excessive writes -- **Efficient Updates**: O(1) operations for trade and OBI updates -- **Window Management**: Automatic transitions without manual intervention - -## Configuration - -### Constructor Parameters -- `window_seconds: int`: Time window for OBI aggregation (default: 60) -- `emit_every_n_updates: int`: Throttling factor for intermediate updates (default: 25) - -### Tuning Guidelines -- **Higher throttling**: Reduces I/O load, delays real-time updates -- **Lower throttling**: More responsive visualization, higher I/O overhead -- **Window size**: Affects granularity of OBI trends (shorter = more detail) - -## Testing - -```bash -uv run pytest test_metrics_calculator.py -v -``` - -Test coverage includes: -- CVD accumulation accuracy across multiple trades -- OBI window rollover and OHLC tracking -- Throttling behavior verification -- Edge cases (unknown trade sides, empty windows) -- Integration with visualization output - -## Known Limitations - -- CVD calculation assumes binary buy/sell classification -- No support for partial fills or complex order types -- OBI calculation treats all liquidity equally (no price weighting) -- Window boundaries aligned to absolute timestamps (no sliding windows) diff --git a/docs/modules/ohlc_processor.md b/docs/modules/ohlc_processor.md deleted file mode 100644 index 0bca237..0000000 --- a/docs/modules/ohlc_processor.md +++ /dev/null @@ -1,122 +0,0 @@ -# Module: ohlc_processor - -## Purpose -The `ohlc_processor` module serves as the main coordinator for trade data processing, orchestrating OHLC aggregation, orderbook management, and metrics calculation. It has been refactored into a modular architecture using composition with specialized helper modules. - -## Public Interface - -### Classes -- `OHLCProcessor(window_seconds: int = 60, depth_levels_per_side: int = 50)`: Main orchestrator class that coordinates trade processing using composition - -### Methods -- `process_trades(trades: list[tuple]) -> None`: Aggregate trades into OHLC bars and update CVD metrics -- `update_orderbook(ob_update: OrderbookUpdate) -> None`: Apply orderbook updates and calculate OBI metrics -- `finalize() -> None`: Emit final OHLC bar and metrics data -- `cvd_cumulative` (property): Access to cumulative volume delta value - -### Composed Modules -- `OrderbookManager`: Handles in-memory orderbook state and depth snapshots -- `MetricsCalculator`: Manages OBI and CVD metric calculations -- `level_parser` functions: Parse and normalize orderbook level data - -## Usage Examples - -```python -from ohlc_processor import OHLCProcessor -from db_interpreter import DBInterpreter - -# Initialize processor with 1-minute windows and 50 depth levels -processor = OHLCProcessor(window_seconds=60, depth_levels_per_side=50) - -# Process streaming data -for ob_update, trades in DBInterpreter(db_path).stream(): - # Aggregate trades into OHLC bars - processor.process_trades(trades) - - # Update orderbook and emit depth snapshots - processor.update_orderbook(ob_update) - -# Finalize processing -processor.finalize() -``` - -### Advanced Configuration -```python -# Custom window size and depth levels -processor = OHLCProcessor( - window_seconds=30, # 30-second bars - depth_levels_per_side=25 # Top 25 levels per side -) -``` - -## Dependencies - -### Internal Modules -- `orderbook_manager.OrderbookManager`: In-memory orderbook state management -- `metrics_calculator.MetricsCalculator`: OBI and CVD metrics calculation -- `level_parser`: Orderbook level parsing utilities -- `viz_io`: JSON output for visualization -- `db_interpreter.OrderbookUpdate`: Input data structures - -### External -- `typing`: Type annotations -- `logging`: Debug and operational logging - -## Modular Architecture - -The processor now follows a clean composition pattern: - -1. **Main Coordinator** (`OHLCProcessor`): - - Orchestrates trade and orderbook processing - - Maintains OHLC bar state and window management - - Delegates specialized tasks to composed modules - -2. **Orderbook Management** (`OrderbookManager`): - - Maintains in-memory price→size mappings - - Applies partial updates and handles deletions - - Provides sorted top-N level extraction - -3. **Metrics Calculation** (`MetricsCalculator`): - - Tracks CVD from trade flow (buy/sell volume delta) - - Calculates OBI from orderbook volume imbalance - - Manages windowed metrics aggregation with throttling - -4. **Level Parsing** (`level_parser` module): - - Normalizes JSON and Python literal level representations - - Handles zero-size levels for orderbook deletions - - Provides robust error handling for malformed data - -## Performance Characteristics - -- **Throttled Updates**: Prevents excessive I/O during high-frequency periods -- **Memory Efficient**: Maintains only current window and top-N depth levels -- **Incremental Processing**: Applies only changed orderbook levels -- **Atomic Operations**: Thread-safe updates to shared data structures - -## Testing - -Run module tests: -```bash -uv run pytest test_ohlc_processor.py -v -``` - -Test coverage includes: -- OHLC calculation accuracy across window boundaries -- Volume accumulation correctness -- High/low price tracking -- Orderbook update application -- Depth snapshot generation -- OBI metric calculation - -## Known Issues - -- Orderbook level parsing assumes well-formed JSON or Python literals -- Memory usage scales with number of active price levels -- Clock skew between trades and orderbook updates not handled - -## Configuration Options - -- `window_seconds`: Time window size for OHLC aggregation (default: 60) -- `depth_levels_per_side`: Number of top price levels to maintain (default: 50) -- `UPSERT_THROTTLE_MS`: Minimum interval between upsert operations (internal) -- `DEPTH_EMIT_THROTTLE_MS`: Minimum interval between depth emissions (internal) diff --git a/docs/modules/orderbook_manager.md b/docs/modules/orderbook_manager.md deleted file mode 100644 index 595670b..0000000 --- a/docs/modules/orderbook_manager.md +++ /dev/null @@ -1,121 +0,0 @@ -# Module: orderbook_manager - -## Purpose -The `orderbook_manager` module provides in-memory orderbook state management with partial update capabilities. It maintains separate bid and ask sides and supports efficient top-level extraction for visualization. - -## Public Interface - -### Classes -- `OrderbookManager(depth_levels_per_side: int = 50)`: Main orderbook state manager - -### Methods -- `apply_updates(bids_updates: List[Tuple[float, float]], asks_updates: List[Tuple[float, float]]) -> None`: Apply partial updates to both sides -- `get_total_volume() -> Tuple[float, float]`: Get total bid and ask volumes -- `get_top_levels() -> Tuple[List[List[float]], List[List[float]]]`: Get sorted top levels for both sides - -### Private Methods -- `_apply_partial_updates(side_map: Dict[float, float], updates: List[Tuple[float, float]]) -> None`: Apply updates to one side -- `_build_top_levels(side_map: Dict[float, float], limit: int, reverse: bool) -> List[List[float]]`: Extract sorted top levels - -## Usage Examples - -```python -from orderbook_manager import OrderbookManager - -# Initialize manager -manager = OrderbookManager(depth_levels_per_side=25) - -# Apply orderbook updates -bids = [(50000.0, 1.5), (49999.0, 2.0)] -asks = [(50001.0, 1.2), (50002.0, 0.8)] -manager.apply_updates(bids, asks) - -# Get volume totals for OBI calculation -total_bids, total_asks = manager.get_total_volume() -obi = total_bids - total_asks - -# Get top levels for depth visualization -bids_sorted, asks_sorted = manager.get_top_levels() - -# Handle deletions (size = 0) -deletions = [(50000.0, 0.0)] # Remove price level -manager.apply_updates(deletions, []) -``` - -## Dependencies - -### External -- `typing`: Type annotations for Dict, List, Tuple - -## State Management - -### Internal State -- `_book_bids: Dict[float, float]`: Price → size mapping for bid side -- `_book_asks: Dict[float, float]`: Price → size mapping for ask side -- `depth_levels_per_side: int`: Configuration for top-N extraction - -### Update Semantics -- **Size = 0**: Remove price level (deletion) -- **Size > 0**: Upsert price level with new size -- **Size < 0**: Ignored (invalid update) - -### Sorting Behavior -- **Bids**: Descending by price (highest price first) -- **Asks**: Ascending by price (lowest price first) -- **Top-N**: Limited by `depth_levels_per_side` parameter - -## Performance Characteristics - -- **Memory Efficient**: Only stores non-zero price levels -- **Fast Updates**: O(1) upsert/delete operations using dict -- **Efficient Sorting**: Only sorts when extracting top levels -- **Bounded Output**: Limits result size for visualization performance - -## Use Cases - -### OBI Calculation -```python -total_bids, total_asks = manager.get_total_volume() -order_book_imbalance = total_bids - total_asks -``` - -### Depth Visualization -```python -bids, asks = manager.get_top_levels() -depth_payload = {"bids": bids, "asks": asks} -``` - -### Incremental Updates -```python -# Typical orderbook update cycle -updates = parse_orderbook_changes(raw_data) -manager.apply_updates(updates['bids'], updates['asks']) -``` - -## Testing - -```bash -uv run pytest test_orderbook_manager.py -v -``` - -Test coverage includes: -- Partial update application correctness -- Deletion handling (size = 0) -- Volume calculation accuracy -- Top-level sorting and limiting -- Edge cases (empty books, single levels) -- Performance with large orderbooks - -## Configuration - -- `depth_levels_per_side`: Controls output size for visualization (default: 50) -- Affects memory usage and sorting performance -- Higher values provide more market depth detail -- Lower values improve processing speed - -## Known Limitations - -- No built-in validation of price/size values -- Memory usage scales with number of unique price levels -- No historical state tracking (current snapshot only) -- No support for spread calculation or market data statistics diff --git a/docs/modules/viz_io.md b/docs/modules/viz_io.md deleted file mode 100644 index e6138e0..0000000 --- a/docs/modules/viz_io.md +++ /dev/null @@ -1,155 +0,0 @@ -# Module: viz_io - -## Purpose -The `viz_io` module provides atomic inter-process communication (IPC) between the data processing pipeline and the visualization frontend. It manages JSON file-based data exchange with atomic writes to prevent race conditions and data corruption. - -## Public Interface - -### Functions -- `add_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Append new OHLC bar to rolling dataset -- `upsert_ohlc_bar(timestamp, open_price, high_price, low_price, close_price, volume)`: Update existing bar or append new one -- `clear_data()`: Reset OHLC dataset to empty state -- `add_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Append OBI metric bar -- `upsert_metric_bar(timestamp, obi_open, obi_high, obi_low, obi_close)`: Update existing OBI bar or append new one -- `clear_metrics()`: Reset metrics dataset to empty state -- `set_depth_data(bids, asks)`: Update current orderbook depth snapshot - -### Constants -- `DATA_FILE`: Path to OHLC data JSON file -- `DEPTH_FILE`: Path to depth data JSON file -- `METRICS_FILE`: Path to metrics data JSON file -- `MAX_BARS`: Maximum number of bars to retain (1000) - -## Usage Examples - -### Basic OHLC Operations -```python -import viz_io - -# Add a new OHLC bar -viz_io.add_ohlc_bar( - timestamp=1640995200000, # Unix timestamp in milliseconds - open_price=50000.0, - high_price=50100.0, - low_price=49900.0, - close_price=50050.0, - volume=125.5 -) - -# Update the current bar (if timestamp matches) or add new one -viz_io.upsert_ohlc_bar( - timestamp=1640995200000, - open_price=50000.0, - high_price=50150.0, # Updated high - low_price=49850.0, # Updated low - close_price=50075.0, # Updated close - volume=130.2 # Updated volume -) -``` - -### Orderbook Depth Management -```python -# Set current depth snapshot -bids = [[49990.0, 1.5], [49985.0, 2.1], [49980.0, 0.8]] -asks = [[50010.0, 1.2], [50015.0, 1.8], [50020.0, 2.5]] - -viz_io.set_depth_data(bids, asks) -``` - -### Metrics Operations -```python -# Add Order Book Imbalance metrics -viz_io.add_metric_bar( - timestamp=1640995200000, - obi_open=0.15, - obi_high=0.22, - obi_low=0.08, - obi_close=0.18 -) -``` - -## Dependencies - -### Internal -- None (standalone utility module) - -### External -- `json`: JSON serialization/deserialization -- `pathlib`: File path handling -- `typing`: Type annotations -- `tempfile`: Atomic write operations - -## Data Formats - -### OHLC Data (`ohlc_data.json`) -```json -[ - [1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5], - [1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3] -] -``` -Format: `[timestamp, open, high, low, close, volume]` - -### Depth Data (`depth_data.json`) -```json -{ - "bids": [[49990.0, 1.5], [49985.0, 2.1]], - "asks": [[50010.0, 1.2], [50015.0, 1.8]] -} -``` -Format: `{"bids": [[price, size], ...], "asks": [[price, size], ...]}` - -### Metrics Data (`metrics_data.json`) -```json -[ - [1640995200000, 0.15, 0.22, 0.08, 0.18], - [1640995260000, 0.18, 0.25, 0.12, 0.20] -] -``` -Format: `[timestamp, obi_open, obi_high, obi_low, obi_close]` - -## Atomic Write Operations - -All write operations use atomic file replacement to prevent partial reads: - -1. Write data to temporary file -2. Flush and sync to disk -3. Atomically rename temporary file to target file - -This ensures the visualization frontend always reads complete, valid JSON data. - -## Performance Characteristics - -- **Bounded Memory**: OHLC and metrics datasets limited to 1000 bars max -- **Atomic Operations**: No partial reads possible during writes -- **Rolling Window**: Automatic trimming of old data maintains constant memory usage -- **Fast Lookups**: Timestamp-based upsert operations use list scanning (acceptable for 1000 items) - -## Testing - -Run module tests: -```bash -uv run pytest test_viz_io.py -v -``` - -Test coverage includes: -- Atomic write operations -- Data format validation -- Rolling window behavior -- Upsert logic correctness -- File corruption prevention -- Concurrent read/write scenarios - -## Known Issues - -- File I/O may block briefly during atomic writes -- JSON parsing errors not propagated to callers -- Limited to 1000 bars maximum (configurable via MAX_BARS) -- No compression for large datasets - -## Thread Safety - -All operations are thread-safe for single writer, multiple reader scenarios: -- Writer: Data processing pipeline (single thread) -- Readers: Visualization frontend (polling) -- Atomic file operations prevent corruption during concurrent access diff --git a/main.py b/main.py index c1f93cc..bf9a405 100644 --- a/main.py +++ b/main.py @@ -1,22 +1,24 @@ +import json import logging import typer from pathlib import Path from datetime import datetime, timezone -import subprocess -import time import threading from db_interpreter import DBInterpreter from ohlc_processor import OHLCProcessor +from strategy import Strategy from desktop_app import MainWindow import sys from PySide6.QtWidgets import QApplication -from PySide6.QtCore import Signal, QTimer +from PySide6.QtCore import QTimer def main(instrument: str = typer.Argument(..., help="Instrument to backtest, e.g. BTC-USDT"), start_date: str = typer.Argument(..., help="Start date, e.g. 2025-07-01"), - end_date: str = typer.Argument(..., help="End date, e.g. 2025-08-01")): - logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s") + end_date: str = typer.Argument(..., help="End date, e.g. 2025-08-01"), + timeframe_minutes: int = typer.Option(15, "--timeframe-minutes", help="Timeframe in minutes"), + ui: bool = typer.Option(False, "--ui/--no-ui", help="Enable UI")): + logging.basicConfig(filename="strategy.log", level=logging.DEBUG, format="%(asctime)s %(levelname)s %(message)s") start_date = datetime.strptime(start_date, "%Y-%m-%d").replace(tzinfo=timezone.utc) end_date = datetime.strptime(end_date, "%Y-%m-%d").replace(tzinfo=timezone.utc) @@ -36,52 +38,68 @@ def main(instrument: str = typer.Argument(..., help="Instrument to backtest, e.g logging.info(f"Found {len(db_paths)} database files: {[p.name for p in db_paths]}") - processor = OHLCProcessor(aggregate_window_seconds=60 * 60) - - app = QApplication(sys.argv) - desktop_app = MainWindow() - desktop_app.show() - - timer = QTimer() - timer.timeout.connect(lambda: desktop_app.update_data(processor)) - timer.start(1000) + processor = OHLCProcessor(aggregate_window_seconds=60 * timeframe_minutes) + strategy = Strategy( + lookback=30, + min_volume_factor=1.0, + confirm_break_signal_high=True, # or False to disable + debug=True, + debug_level=2, # 1=key events, 2=per-bar detail + debug_every_n_bars=100 + ) def process_data(): + db_to_process = [] + + for db_path in db_paths: + db_name_parts = db_path.name.split(".")[0].split("-") + if len(db_name_parts) < 5: + logging.warning(f"Unexpected filename format: {db_path.name}") + continue + + db_name = db_name_parts[2:5] + db_date = datetime.strptime("".join(db_name), "%y%m%d").replace(tzinfo=timezone.utc) + + if db_date < start_date or db_date >= end_date: + continue + db_to_process.append(db_path) + + for i, db_path in enumerate(db_to_process): + print(f"{i}/{len(db_to_process)}") + + db_interpreter = DBInterpreter(db_path) + + for orderbook_update, trades in db_interpreter.stream(): + processor.update_orderbook(orderbook_update) + processor.process_trades(trades) + strategy.process(processor) + + processor.flush() + strategy.process(processor) + try: - for db_path in db_paths: - db_name_parts = db_path.name.split(".")[0].split("-") - if len(db_name_parts) < 5: - logging.warning(f"Unexpected filename format: {db_path.name}") - continue - - db_name = db_name_parts[2:5] - db_date = datetime.strptime("".join(db_name), "%y%m%d").replace(tzinfo=timezone.utc) + strategy.on_finish(processor) # optional: flat at last close + except Exception: + pass + + print(json.dumps(strategy.get_stats(), indent=2)) - if db_date < start_date or db_date >= end_date: - logging.info(f"Skipping {db_path.name} - outside date range") - continue + if ui: + data_thread = threading.Thread(target=process_data, daemon=True) + data_thread.start() - logging.info(f"Processing database: {db_path.name}") - db_interpreter = DBInterpreter(db_path) - - batch_count = 0 - for orderbook_update, trades in db_interpreter.stream(): - batch_count += 1 - - processor.update_orderbook(orderbook_update) - processor.process_trades(trades) - # desktop_app.update_data(processor) - - - logging.info("Data processing completed") - except Exception as e: - logging.error(f"Error in data processing: {e}") + app = QApplication(sys.argv) + desktop_app = MainWindow() + desktop_app.show() - data_thread = threading.Thread(target=process_data, daemon=True) - data_thread.start() - - app.exec() + timer = QTimer() + timer.timeout.connect(lambda: desktop_app.update_data(processor, 30)) + timer.start(1000) + + app.exec() + else: + process_data() if __name__ == "__main__": diff --git a/metrics_calculator.py b/metrics_calculator.py index c61957f..11a4ec8 100644 --- a/metrics_calculator.py +++ b/metrics_calculator.py @@ -1,44 +1,74 @@ +# metrics_calculator.py import logging -from typing import Optional, List +from typing import List, Optional class MetricsCalculator: def __init__(self): - self.cvd_cumulative = 0.0 - self.obi_value = 0.0 + # ----- OBI (value at close) ----- + self._obi_value = 0.0 - # --- per-bucket state --- + # ----- CVD (bucket-local delta, TradingView-style) ----- + self._bucket_buy = 0.0 + self._bucket_sell = 0.0 + self._bucket_net = 0.0 # buy - sell within current bucket + + # --- per-bucket lifecycle state --- self._b_ts_start: Optional[int] = None self._b_ts_end: Optional[int] = None - self._obi_o: Optional[float] = None - self._obi_h: Optional[float] = None - self._obi_l: Optional[float] = None - self._obi_c: Optional[float] = None - # final series rows: [ts_start, ts_end, obi_o, obi_h, obi_l, obi_c, cvd] - self._series: List[List[float]] = [] + self._obi_o = self._obi_h = self._obi_l = self._obi_c = None + self._cvd_o = self._cvd_h = self._cvd_l = self._cvd_c = None + + # final series rows: + # [ts_start, ts_end, o, h, l, c, value_at_close] + self._series_obi: List[List[float]] = [] + self._series_cvd: List[List[float]] = [] + + # ----- ATR(14) ----- + self._atr_period: int = 14 + self._prev_close: Optional[float] = None + self._tr_window: List[float] = [] # last N TR values + self._series_atr: List[float] = [] # one value per finalized bucket # ------------------------------ - # CVD + # CVD (bucket-local delta) # ------------------------------ def update_cvd_from_trade(self, side: str, size: float) -> None: + """Accumulate buy/sell within the *current* bucket (TV-style volume delta).""" + if self._b_ts_start is None: + # bucket not open yet; processor should call begin_bucket first + return + + s = float(size) if side == "buy": - volume_delta = float(size) + self._bucket_buy += s elif side == "sell": - volume_delta = -float(size) + self._bucket_sell += s else: - logging.warning(f"Unknown trade side '{side}', treating as neutral") - volume_delta = 0.0 - self.cvd_cumulative += volume_delta + logging.warning(f"Unknown trade side '{side}', ignoring") + return + + self._bucket_net = self._bucket_buy - self._bucket_sell + v = self._bucket_net + + if self._cvd_o is None: + self._cvd_o = 0.0 + self._cvd_h = v + self._cvd_l = v + self._cvd_c = v + else: + self._cvd_h = max(self._cvd_h, v) + self._cvd_l = min(self._cvd_l, v) + self._cvd_c = v # ------------------------------ # OBI # ------------------------------ def update_obi_from_book(self, total_bids: float, total_asks: float) -> None: - self.obi_value = float(total_bids - total_asks) - # update H/L/C if a bucket is open + self._obi_value = float(total_bids - total_asks) if self._b_ts_start is not None: - v = self.obi_value + v = self._obi_value if self._obi_o is None: self._obi_o = self._obi_h = self._obi_l = self._obi_c = v else: @@ -46,38 +76,88 @@ class MetricsCalculator: self._obi_l = min(self._obi_l, v) self._obi_c = v + # ------------------------------ + # ATR helpers + # ------------------------------ + def _update_atr_from_bar(self, high: float, low: float, close: float) -> None: + if self._prev_close is None: + tr = float(high) - float(low) + else: + tr = max( + float(high) - float(low), + abs(float(high) - float(self._prev_close)), + abs(float(low) - float(self._prev_close)), + ) + self._tr_window.append(tr) + if len(self._tr_window) > self._atr_period: + self._tr_window.pop(0) + atr = (sum(self._tr_window) / len(self._tr_window)) if self._tr_window else 0.0 + self._series_atr.append(atr) + self._prev_close = float(close) + # ------------------------------ # Bucket lifecycle # ------------------------------ def begin_bucket(self, ts_start_ms: int, ts_end_ms: int) -> None: self._b_ts_start = int(ts_start_ms) self._b_ts_end = int(ts_end_ms) - v = float(self.obi_value) - self._obi_o = self._obi_h = self._obi_l = self._obi_c = v - def finalize_bucket(self) -> None: + # OBI opens at current value + self._obi_o = self._obi_h = self._obi_l = self._obi_c = self._obi_value + + # CVD resets each bucket + self._bucket_buy = 0.0 + self._bucket_sell = 0.0 + self._bucket_net = 0.0 + self._cvd_o = 0.0 + self._cvd_h = 0.0 + self._cvd_l = 0.0 + self._cvd_c = 0.0 + + def finalize_bucket(self, bar: Optional[dict] = None) -> None: if self._b_ts_start is None or self._b_ts_end is None: return - o = float(self._obi_o if self._obi_o is not None else self.obi_value) - h = float(self._obi_h if self._obi_h is not None else self.obi_value) - l = float(self._obi_l if self._obi_l is not None else self.obi_value) - c = float(self._obi_c if self._obi_c is not None else self.obi_value) - self._series.append([ - self._b_ts_start, self._b_ts_end, o, h, l, c, float(self.cvd_cumulative) - ]) - # reset + + # OBI row + o = float(self._obi_o if self._obi_o is not None else self._obi_value) + h = float(self._obi_h if self._obi_h is not None else self._obi_value) + l = float(self._obi_l if self._obi_l is not None else self._obi_value) + c = float(self._obi_c if self._obi_c is not None else self._obi_value) + self._series_obi.append([self._b_ts_start, self._b_ts_end, o, h, l, c, float(self._obi_value)]) + + # CVD row (bucket-local delta) + o = float(self._cvd_o if self._cvd_o is not None else 0.0) + h = float(self._cvd_h if self._cvd_h is not None else 0.0) + l = float(self._cvd_l if self._cvd_l is not None else 0.0) + c = float(self._cvd_c if self._cvd_c is not None else 0.0) + self._series_cvd.append([self._b_ts_start, self._b_ts_end, o, h, l, c, float(self._bucket_net)]) + + # ATR from the finalized OHLC bar + if bar is not None: + try: + self._update_atr_from_bar(bar["high"], bar["low"], bar["close"]) + except Exception as e: + logging.debug(f"ATR update error (ignored): {e}") + + # reset state self._b_ts_start = self._b_ts_end = None self._obi_o = self._obi_h = self._obi_l = self._obi_c = None + self._cvd_o = self._cvd_h = self._cvd_l = self._cvd_c = None def add_flat_bucket(self, ts_start_ms: int, ts_end_ms: int) -> None: - v = float(self.obi_value) - self._series.append([ - int(ts_start_ms), int(ts_end_ms), - v, v, v, v, float(self.cvd_cumulative) - ]) + # OBI flat + v_obi = float(self._obi_value) + self._series_obi.append([int(ts_start_ms), int(ts_end_ms), v_obi, v_obi, v_obi, v_obi, v_obi]) + + # CVD flat at zero (no trades in this bucket) + self._series_cvd.append([int(ts_start_ms), int(ts_end_ms), 0.0, 0.0, 0.0, 0.0, 0.0]) + + # ATR: carry last ATR forward if any, else 0.0 + last_atr = self._series_atr[-1] if self._series_atr else 0.0 + self._series_atr.append(float(last_atr)) # ------------------------------ # Output # ------------------------------ def get_series(self): - return self._series + return {'cvd': self._series_cvd, 'obi': self._series_obi, 'atr': self._series_atr} diff --git a/ohlc_processor.py b/ohlc_processor.py index ca1f5f4..44e3925 100644 --- a/ohlc_processor.py +++ b/ohlc_processor.py @@ -18,12 +18,10 @@ class OHLCProcessor: `carry_forward_open`). """ - def __init__(self, aggregate_window_seconds: int, carry_forward_open: bool = True) -> None: + def __init__(self, aggregate_window_seconds: int) -> None: self.aggregate_window_seconds = int(aggregate_window_seconds) self._bucket_ms = self.aggregate_window_seconds * 1000 - self.carry_forward_open = carry_forward_open - self.current_bar: Optional[Dict[str, Any]] = None self._current_bucket_index: Optional[int] = None self._last_close: Optional[float] = None @@ -31,8 +29,8 @@ class OHLCProcessor: self.trades_processed = 0 self.bars: List[Dict[str, Any]] = [] - self._orderbook = OrderbookManager() - self._metrics = MetricsCalculator() + self.orderbook = OrderbookManager() + self.metrics = MetricsCalculator() # ----------------------- # Internal helpers @@ -62,9 +60,9 @@ class OHLCProcessor: gap_bar = self._new_bar(start_ms, self._last_close) self.bars.append(gap_bar) - # metrics: add a flat bucket to keep OBI/CVD time-continuous + # metrics: add a flat bucket to keep OBI/CVD/ATR time-continuous try: - self._metrics.add_flat_bucket(start_ms, start_ms + self._bucket_ms) + self.metrics.add_flat_bucket(start_ms, start_ms + self._bucket_ms) except Exception as e: logging.debug(f"metrics add_flat_bucket error (ignored): {e}") @@ -89,59 +87,33 @@ class OHLCProcessor: timestamp_ms = int(timestamp_ms) self.trades_processed += 1 - # Metrics driven by trades: update CVD - try: - self._metrics.update_cvd_from_trade(side, size) - except Exception as e: - logging.debug(f"CVD update error (ignored): {e}") - # Determine this trade's bucket bucket_index = timestamp_ms // self._bucket_ms bucket_start = bucket_index * self._bucket_ms # New bucket? if self._current_bucket_index is None or bucket_index != self._current_bucket_index: - # finalize prior bar + # finalize prior bar (also finalizes metrics incl. ATR) if self.current_bar is not None: self.bars.append(self.current_bar) self._last_close = self.current_bar["close"] - # finalize metrics for the prior bucket window - try: - self._metrics.finalize_bucket() - except Exception as e: - logging.debug(f"metrics finalize_bucket error (ignored): {e}") - + self.metrics.finalize_bucket(self.current_bar) # <— pass bar for ATR # handle gaps if self._current_bucket_index is not None and bucket_index > self._current_bucket_index + 1: self._emit_gap_bars(self._current_bucket_index, bucket_index) - # pick open price policy - if self.carry_forward_open and self._last_close is not None: - open_for_new = self._last_close - else: - open_for_new = price + open_for_new = self._last_close if self._last_close is not None else price self.current_bar = self._new_bar(bucket_start, open_for_new) self._current_bucket_index = bucket_index - # begin a new metrics bucket aligned to this bar window - try: - self._metrics.begin_bucket(bucket_start, bucket_start + self._bucket_ms) - except Exception as e: - logging.debug(f"metrics begin_bucket error (ignored): {e}") + self.metrics.begin_bucket(bucket_start, bucket_start + self._bucket_ms) + + # Metrics driven by trades: update CVD + self.metrics.update_cvd_from_trade(side, float(size)) # Update current bucket with this trade b = self.current_bar - if b is None: - # Should not happen, but guard anyway - b = self._new_bar(bucket_start, price) - self.current_bar = b - self._current_bucket_index = bucket_index - try: - self._metrics.begin_bucket(bucket_start, bucket_start + self._bucket_ms) - except Exception as e: - logging.debug(f"metrics begin_bucket (guard) error (ignored): {e}") - b["high"] = max(b["high"], price) b["low"] = min(b["low"], price) b["close"] = price @@ -154,12 +126,16 @@ class OHLCProcessor: if self.current_bar is not None: self.bars.append(self.current_bar) self._last_close = self.current_bar["close"] + try: + self.metrics.finalize_bucket(self.current_bar) # <— pass bar for ATR + except Exception as e: + logging.debug(f"metrics finalize_bucket on flush error (ignored): {e}") self.current_bar = None - # finalize any open metrics bucket - try: - self._metrics.finalize_bucket() - except Exception as e: - logging.debug(f"metrics finalize_bucket on flush error (ignored): {e}") + else: + try: + self.metrics.finalize_bucket(None) + except Exception as e: + logging.debug(f"metrics finalize_bucket on flush error (ignored): {e}") def update_orderbook(self, ob_update: OrderbookUpdate) -> None: """ @@ -169,11 +145,11 @@ class OHLCProcessor: bids_updates = parse_levels_including_zeros(ob_update.bids) asks_updates = parse_levels_including_zeros(ob_update.asks) - self._orderbook.apply_updates(bids_updates, asks_updates) + self.orderbook.apply_updates(bids_updates, asks_updates) - total_bids, total_asks = self._orderbook.get_total_volume() + total_bids, total_asks = self.orderbook.get_total_volume() try: - self._metrics.update_obi_from_book(total_bids, total_asks) + self.metrics.update_obi_from_book(total_bids, total_asks) except Exception as e: logging.debug(f"OBI update error (ignored): {e}") @@ -182,10 +158,14 @@ class OHLCProcessor: # ----------------------- def get_metrics_series(self): """ - Returns a list of rows: - [timestamp_start_ms, timestamp_end_ms, obi_open, obi_high, obi_low, obi_close, cvd_value] + Returns: + { + 'cvd': [[ts_start, ts_end, o, h, l, c, value_at_close], ...], + 'obi': [[...], ...], + 'atr': [atr_value_per_bar, ...] + } """ try: - return self._metrics.get_series() + return self.metrics.get_series() except Exception: - return [] + return {} diff --git a/strategy.py b/strategy.py new file mode 100644 index 0000000..6abc1d9 --- /dev/null +++ b/strategy.py @@ -0,0 +1,386 @@ +# strategy.py +import logging +from typing import List, Dict, Optional +from statistics import median +from datetime import datetime, timezone + + +class Strategy: + """ + Long-only CVD Divergence with ATR-based execution, fee-aware PnL, cooldown, + adaptive CVD strength, optional confirmation entry, and debug logging. + + Configure logging in main.py, for example: + logging.basicConfig( + filename="strategy.log", + level=logging.DEBUG, + format="%(asctime)s %(levelname)s %(message)s" + ) + """ + + def __init__( + self, + # Core signal windows + lookback: int = 30, + min_volume_factor: float = 1.0, + + # ATR & execution + atr_period: int = 14, + atr_mult_init: float = 2.0, + atr_mult_trail: float = 3.0, + breakeven_after_rr: float = 1.5, + min_bars_before_be: int = 2, + atr_min_rel_to_med: float = 1.0, + cooldown_bars: int = 3, + + # Divergence strength thresholds + price_ll_min_atr: float = 0.05, + cvd_min_gap: float = 0.0, # if 0 → adaptive + cvd_gap_pct_of_range: float = 0.10, # 10% of rolling cumCVD range + + # Entry confirmation + confirm_break_signal_high: bool = True, + + # Fees + fee_rate: float = 0.002, # taker 0.20% per side + fee_rate_maker: float = 0.0008, # maker 0.08% per side + maker_entry: bool = False, + maker_exit: bool = False, + + # Debug + debug: bool = False, + debug_level: int = 1, # 0=quiet, 1=key, 2=detail + debug_every_n_bars: int = 200, + ): + # Params + self.lookback = lookback + self.min_volume_factor = min_volume_factor + + self.atr_period = atr_period + self.atr_mult_init = atr_mult_init + self.atr_mult_trail = atr_mult_trail + self.breakeven_after_rr = breakeven_after_rr + self.min_bars_before_be = min_bars_before_be + self.atr_min_rel_to_med = atr_min_rel_to_med + self.cooldown_bars = cooldown_bars + + self.price_ll_min_atr = price_ll_min_atr + self.cvd_min_gap = cvd_min_gap + self.cvd_gap_pct_of_range = cvd_gap_pct_of_range + + self.confirm_break_signal_high = confirm_break_signal_high + + self.fee_rate = fee_rate + self.fee_rate_maker = fee_rate_maker + self.maker_entry = maker_entry + self.maker_exit = maker_exit + + self.debug = debug + self.debug_level = debug_level + self.debug_every_n_bars = debug_every_n_bars + + # Runtime state + self._last_bar_i: int = 0 + self._cum_cvd: List[float] = [] + self._atr_vals: List[float] = [] + + self._in_position: bool = False + self._entry_price: float = 0.0 + self._entry_i: int = -1 + self._atr_at_entry: float = 0.0 + self._stop: float = 0.0 + + self._pending_entry_i: Optional[int] = None + self._pending_from_signal_i: Optional[int] = None + self._signal_high: Optional[float] = None + self._cooldown_until_i: int = -1 + + self.trades: List[Dict] = [] + + # Debug counters + self._dbg = { + "bars_seen": 0, + "checked": 0, + "fail_div_price": 0, + "fail_div_cvd": 0, + "fail_vol": 0, + "fail_atr_regime": 0, + "fail_price_strength": 0, + "fail_cvd_strength": 0, + "pass_all": 0, + "signals_created": 0, + "signals_canceled_no_break": 0, + "skipped_cooldown": 0, + "entries": 0, + "trail_raises": 0, + "to_be": 0, + "exits_sl": 0, + "exits_be": 0, + "exits_eor": 0, + } + + # ============ helpers ============ + @staticmethod + def _fmt_ts(ms: int) -> str: + try: + return datetime.fromtimestamp(int(ms) / 1000, tz=timezone.utc).strftime("%Y-%m-%d %H:%M:%S") + except Exception: + return str(ms) + + def _ensure_cum_cvd(self, cvd_rows: List[List[float]], upto_len: int) -> None: + while len(self._cum_cvd) < upto_len: + i = len(self._cum_cvd) + bucket_net = float(cvd_rows[i][6]) if i < len(cvd_rows) else 0.0 + prev = self._cum_cvd[-1] if self._cum_cvd else 0.0 + self._cum_cvd.append(prev + bucket_net) + + def _ensure_atr(self, bars: List[Dict], upto_len: int, metrics_atr: Optional[List[float]]) -> None: + if metrics_atr and len(metrics_atr) >= upto_len: + self._atr_vals = [float(x) for x in metrics_atr[:upto_len]] + return + + while len(self._atr_vals) < upto_len: + i = len(self._atr_vals) + if i == 0: + self._atr_vals.append(0.0) + continue + h = float(bars[i]["high"]) + l = float(bars[i]["low"]) + pc = float(bars[i-1]["close"]) + tr = max(h - l, abs(h - pc), abs(l - pc)) + + if i < self.atr_period: + prev_sum = (self._atr_vals[-1] * (i - 1)) if i > 1 else 0.0 + atr = (prev_sum + tr) / float(i) + else: + prev_atr = self._atr_vals[-1] + atr = (prev_atr * (self.atr_period - 1) + tr) / float(self.atr_period) + self._atr_vals.append(atr) + + # ============ filters ============ + def _volume_ok(self, bars, i): + if self.min_volume_factor <= 0: + return True + start = max(0, i - self.lookback) + past = [b["volume"] for b in bars[start:i]] or [0.0] + med_v = median(past) + return (med_v == 0) or (bars[i]["volume"] >= self.min_volume_factor * med_v) + + def _atr_ok(self, i): + if i <= 0 or i >= len(self._atr_vals): + return False + start = max(0, i - self.lookback) + window = self._atr_vals[start:i] or [0.0] + med_atr = median(window) + return (med_atr == 0.0) or (self._atr_vals[i] >= self.atr_min_rel_to_med * med_atr) + + def _adaptive_cvd_gap(self, i): + if self.cvd_min_gap > 0.0: + return self.cvd_min_gap + start = max(0, i - self.lookback) + window = self._cum_cvd[start:i] or [0.0] + rng = (max(window) - min(window)) if window else 0.0 + return rng * self.cvd_gap_pct_of_range + + def _is_bullish_divergence(self, bars, i): + if i < self.lookback: + return False + self._dbg["checked"] += 1 + start = i - self.lookback + wl = [b["low"] for b in bars[start:i]] + win_low = min(wl) if wl else bars[i]["low"] + pr_ll = bars[i]["low"] < win_low + if not pr_ll: + self._dbg["fail_div_price"] += 1 + return False + wcvd = self._cum_cvd[start:i] or [self._cum_cvd[i]] + win_cvd_min = min(wcvd) if wcvd else self._cum_cvd[i] + cvd_hl = self._cum_cvd[i] > win_cvd_min + if not cvd_hl: + self._dbg["fail_div_cvd"] += 1 + return False + if not self._volume_ok(bars, i): + self._dbg["fail_vol"] += 1 + return False + if not self._atr_ok(i): + self._dbg["fail_atr_regime"] += 1 + return False + atr_i = self._atr_vals[i] + price_gap = (win_low - bars[i]["low"]) + if price_gap < self.price_ll_min_atr * atr_i: + self._dbg["fail_price_strength"] += 1 + return False + required_gap = self._adaptive_cvd_gap(i) + cvd_gap = (self._cum_cvd[i] - win_cvd_min) + if cvd_gap < required_gap: + self._dbg["fail_cvd_strength"] += 1 + return False + self._dbg["pass_all"] += 1 + return True + + # ============ execution ============ + def _net_breakeven_price(self): + f_entry = self.fee_rate_maker if self.maker_entry else self.fee_rate + f_exit = self.fee_rate_maker if self.maker_exit else self.fee_rate + return self._entry_price * ((1.0 + f_entry) / max(1e-12, (1.0 - f_exit))) + + def _do_enter(self, bars, i): + b = bars[i] + atr = float(self._atr_vals[i]) if i < len(self._atr_vals) else 0.0 + self._in_position = True + self._entry_price = float(b["open"]) + self._entry_i = i + self._atr_at_entry = atr + self._stop = self._entry_price - self.atr_mult_init * atr + self._pending_entry_i = None + self._signal_high = None + self._pending_from_signal_i = None + self._dbg["entries"] += 1 + logging.info(f"[ENTRY] ts={b['timestamp_start']} ({self._fmt_ts(b['timestamp_start'])}) " + f"price={self._entry_price:.2f} stop={self._stop:.2f} (ATR={atr:.2f})") + + def _exit_with_fees(self, bars, i, exit_price, reason): + entry = self.trades[-1] if self.trades and self.trades[-1].get("exit_i") is None else None + if not entry: + entry = {"entry_i": self._entry_i, "entry_ts": bars[self._entry_i]["timestamp_start"] if self._entry_i >= 0 else None, + "entry_price": self._entry_price} + self.trades.append(entry) + entry_price = float(entry["entry_price"]) + exit_price = float(exit_price) + fr_entry = self.fee_rate_maker if self.maker_entry else self.fee_rate + fr_exit = self.fee_rate_maker if self.maker_exit else self.fee_rate + pnl_gross = (exit_price / entry_price) - 1.0 + net_factor = (exit_price * (1.0 - fr_exit)) / (entry_price * (1.0 + fr_entry)) + pnl_net = net_factor - 1.0 + if reason == "SL": self._dbg["exits_sl"] += 1 + elif reason == "BE": self._dbg["exits_be"] += 1 + elif reason == "EoR": self._dbg["exits_eor"] += 1 + entry.update({ + "exit_i": i, "exit_ts": bars[i]["timestamp_start"], "exit_price": exit_price, + "pnl_gross_pct": pnl_gross * 100.0, "pnl_net_pct": pnl_net * 100.0, + "fees_pct": (fr_entry + fr_exit) * 100.0, "reason": reason, + "fee_rate_entry": fr_entry, "fee_rate_exit": fr_exit, + }) + logging.info(f"[EXIT {reason}] ts={bars[i]['timestamp_start']} ({self._fmt_ts(bars[i]['timestamp_start'])}) " + f"pnl_net={pnl_net*100:.2f}% (gross={pnl_gross*100:.2f}%, fee={(fr_entry+fr_exit)*100:.2f}%)") + + # ============ main ============ + def process(self, processor): + bars = processor.bars + series = processor.get_metrics_series() + cvd_rows = series.get("cvd", []) + metrics_atr = series.get("atr") + n = min(len(bars), len(cvd_rows)) + if n <= self._last_bar_i: + return + + self._ensure_cum_cvd(cvd_rows, n) + self._ensure_atr(bars, n, metrics_atr) + + for i in range(self._last_bar_i, n): + b = bars[i] + self._dbg["bars_seen"] += 1 + + # periodic snapshot + if self.debug and self.debug_level >= 1 and (i % max(1, self.debug_every_n_bars) == 0): + atr_i = self._atr_vals[i] if i < len(self._atr_vals) else 0.0 + logging.debug(f"[BAR] i={i} ts={b['timestamp_start']} " + f"O={b['open']:.2f} H={b['high']:.2f} L={b['low']:.2f} C={b['close']:.2f} " + f"V={b['volume']:.4f} ATR={atr_i:.2f} CUMCVD={self._cum_cvd[i]:.2f}") + + # pending entry + if self._pending_entry_i is not None and i == self._pending_entry_i and not self._in_position: + if self.confirm_break_signal_high and self._signal_high is not None: + if b["high"] > self._signal_high: + logging.debug(f"[CONFIRM] i={i} broke signal_high={self._signal_high:.2f} with H={b['high']:.2f} → ENTER") + self._do_enter(bars, i) + else: + self._dbg["signals_canceled_no_break"] += 1 + logging.debug(f"[CANCEL] i={i} no break of signal_high={self._signal_high:.2f} (H={b['high']:.2f})") + self._pending_entry_i = None + self._signal_high = None + self._pending_from_signal_i = None + else: + self._do_enter(bars, i) + + # manage position + if self._in_position: + if b["low"] <= self._stop: + be_price = self._net_breakeven_price() + reason = "BE" if self._stop >= be_price else "SL" + self._exit_with_fees(bars, i, max(self._stop, b["low"]), reason) + self._in_position = False + self._cooldown_until_i = i + self.cooldown_bars + logging.debug(f"[COOLDN] start i={i} until={self._cooldown_until_i}") + continue + + atr_i = self._atr_vals[i] + new_trail = b["close"] - self.atr_mult_trail * atr_i + if new_trail > self._stop: + self._dbg["trail_raises"] += 1 + logging.debug(f"[TRAIL] i={i} stop {self._stop:.2f} → {new_trail:.2f} (ATR={atr_i:.2f})") + self._stop = new_trail + + if (i - self._entry_i) >= self.min_bars_before_be and self._atr_at_entry > 0.0: + if b["close"] >= self._entry_price + self.breakeven_after_rr * self._atr_at_entry: + be_price = self._net_breakeven_price() + self._stop = max(self._stop, be_price, b["close"] - self.atr_mult_trail * atr_i) + self._dbg["to_be"] += 1 + logging.debug(f"[BE] i={i} set stop ≥ netBE={be_price:.2f} now stop={self._stop:.2f}") + + # new signal + if not self._in_position: + if i < self._cooldown_until_i: + self._dbg["skipped_cooldown"] += 1 + else: + if self._is_bullish_divergence(bars, i): + self._signal_high = b["high"] + self._pending_from_signal_i = i + self._pending_entry_i = i + 1 + self._dbg["signals_created"] += 1 + logging.debug(f"[SIGNAL] i={i} ts={b['timestamp_start']} signal_high={self._signal_high:.2f}") + + self._last_bar_i = n + + def on_finish(self, processor): + bars = processor.bars + if self._in_position and bars: + last_i = len(bars) - 1 + last_close = float(bars[last_i]["close"]) + self._exit_with_fees(bars, last_i, last_close, "EoR") + self._in_position = False + + d = self._dbg + logging.info( + "[SUMMARY] " + f"bars={d['bars_seen']} checked={d['checked']} pass={d['pass_all']} " + f"fail_price_div={d['fail_div_price']} fail_cvd_div={d['fail_div_cvd']} " + f"fail_vol={d['fail_vol']} fail_atr_regime={d['fail_atr_regime']} " + f"fail_price_str={d['fail_price_strength']} fail_cvd_str={d['fail_cvd_strength']} " + f"signals={d['signals_created']} canceled_no_break={d['signals_canceled_no_break']} " + f"cooldown_skips={d['skipped_cooldown']} entries={d['entries']} " + f"trail_raises={d['trail_raises']} to_be={d['to_be']} " + f"exits_sl={d['exits_sl']} exits_be={d['exits_be']} exits_eor={d['exits_eor']}" + ) + + def get_stats(self) -> Dict: + done = [t for t in self.trades if "pnl_net_pct" in t] + total = len(done) + wins = [t for t in done if t["pnl_net_pct"] > 0] + avg_net = (sum(t["pnl_net_pct"] for t in done) / total) if total else 0.0 + sum_net = sum(t["pnl_net_pct"] for t in done) + equity = 1.0 + for t in done: + equity *= (1.0 + t["pnl_net_pct"] / 100.0) + compounded_net = (equity - 1.0) * 100.0 + avg_gross = (sum(t.get("pnl_gross_pct", 0.0) for t in done) / total) if total else 0.0 + total_fees = sum((t.get("fees_pct") or 0.0) for t in done) + return { + "trades": total, + "win_rate": (len(wins) / total if total else 0.0), + "avg_pnl_pct": avg_net, + "sum_return_pct": sum_net, + "compounded_return_pct": compounded_net, + "avg_pnl_gross_pct": avg_gross, + "total_fees_pct": total_fees + } diff --git a/tasks/prd-cumulative-volume-delta.md b/tasks/prd-cumulative-volume-delta.md deleted file mode 100644 index 5e4bc09..0000000 --- a/tasks/prd-cumulative-volume-delta.md +++ /dev/null @@ -1,76 +0,0 @@ -## Cumulative Volume Delta (CVD) – Product Requirements Document - -### 1) Introduction / Overview -- Compute and visualize Cumulative Volume Delta (CVD) from trade data processed by `OHLCProcessor.process_trades`, aligned to the existing OHLC bar cadence. -- CVD is defined as the cumulative sum of volume delta, where volume delta = buy_volume - sell_volume per trade. -- Trade classification: `side == "buy"` → positive volume delta, `side == "sell"` → negative volume delta. -- Persist CVD time series as scalar values per window to `metrics_data.json` and render a CVD line chart beneath the current OBI subplot in the Dash UI. - -### 2) Goals -- Compute volume delta from individual trades using the `side` field in the Trade dataclass. -- Accumulate CVD across all processed trades (no session resets initially). -- Aggregate CVD into window-aligned scalar values per `window_seconds`. -- Extend `metrics_data.json` schema to include CVD values alongside existing OBI data. -- Add a CVD line chart subplot beneath OBI in the main chart, sharing the time axis. -- Throttle intra-window upserts of CVD values using the same approach/frequency as current OHLC throttling; always write on window close. - -### 3) User Stories -- As a researcher, I want CVD computed from actual trade data so I can assess buying/selling pressure over time. -- As an analyst, I want CVD stored per time window so I can correlate it with price movements and OBI patterns. -- As a developer, I want cumulative CVD values so I can analyze long-term directional bias in volume flow. - -### 4) Functional Requirements -1. Inputs and Definitions - - Compute volume delta on every trade in `OHLCProcessor.process_trades`: - - If `trade.side == "buy"` → `volume_delta = +trade.size` - - If `trade.side == "sell"` → `volume_delta = -trade.size` - - If `trade.side` is neither "buy" nor "sell" → `volume_delta = 0` (log warning) - - Accumulate into running CVD: `self.cvd_cumulative += volume_delta` -2. Windowing & Aggregation - - Use the same `window_seconds` boundary as OHLC bars; window anchor is derived from the trade timestamp. - - Store CVD value at window boundaries (end-of-window CVD snapshot). - - On window rollover, capture the current `self.cvd_cumulative` value for that window. -3. Persistence - - Extend `metrics_data.json` schema from `[timestamp, obi_open, obi_high, obi_low, obi_close]` to `[timestamp, obi_open, obi_high, obi_low, obi_close, cvd_value]`. - - Update `viz_io.py` functions to handle the new 6-element schema. - - Keep only the last 1000 rows. - - Upsert intra-window CVD values periodically (throttled, matching OHLC's approach) and always write on window close. -4. Visualization - - Read extended `metrics_data.json` in the Dash app with the same tolerant JSON reading/caching approach. - - Extend the main figure to a fourth row for CVD line chart beneath OBI, sharing the x-axis. - - Style CVD as a line chart with appropriate color (distinct from OHLC/Volume/OBI) and add a zero baseline. -5. Performance & Correctness - - CVD compute happens on every trade; I/O is throttled to maintain UI responsiveness. - - Use existing logging and error handling patterns; must not crash if metrics JSON is temporarily unreadable. - - Handle backward compatibility: if existing `metrics_data.json` has 5-element rows, treat missing CVD as 0. -6. Testing - - Unit tests for volume delta calculation with "buy", "sell", and invalid side values. - - Unit tests for CVD accumulation across multiple trades and window boundaries. - - Integration test: fixture trades produce correct CVD progression in `metrics_data.json`. - -### 5) Non-Goals -- No CVD reset functionality (will be implemented later). -- No additional derived CVD metrics (e.g., CVD rate of change, normalized CVD). -- No database persistence for CVD; JSON IPC only. -- No strategy/signal changes based on CVD. - -### 6) Design Considerations -- Implement CVD calculation in `OHLCProcessor.process_trades` alongside existing OHLC aggregation. -- Extend `viz_io.py` metrics functions to support 6-element schema while maintaining backward compatibility. -- Add CVD state tracking: `self.cvd_cumulative`, `self.cvd_window_value` per window. -- Follow the same throttling pattern as OBI metrics for consistency. - -### 7) Technical Considerations -- Add CVD computation in the trade processing loop within `OHLCProcessor.process_trades`. -- Extend `upsert_metric_bar` and `add_metric_bar` functions to accept optional `cvd_value` parameter. -- Handle schema migration gracefully: read existing 5-element rows, append 0.0 for missing CVD. -- Use the same window alignment as trades (based on trade timestamp, not orderbook timestamp). - -### 8) Success Metrics -- `metrics_data.json` present with valid 6-element rows during processing. -- CVD subplot updates smoothly and aligns with OHLC window timestamps. -- CVD increases during buy-heavy periods, decreases during sell-heavy periods. -- No noticeable performance regression in trade processing or UI responsiveness. - -### 9) Open Questions -- None; CVD computation approach confirmed using trade.side field. Schema extension approach confirmed for metrics_data.json. diff --git a/tasks/prd-order-book-imbalance.md b/tasks/prd-order-book-imbalance.md deleted file mode 100644 index 484c149..0000000 --- a/tasks/prd-order-book-imbalance.md +++ /dev/null @@ -1,71 +0,0 @@ -## Order Book Imbalance (OBI) – Product Requirements Document - -### 1) Introduction / Overview -- Compute and visualize Order Book Imbalance (OBI) from the in-memory order book maintained by `OHLCProcessor`, aligned to the existing OHLC bar cadence. -- OBI is defined as raw `B - A`, where `B` is total bid size and `A` is total ask size. -- Persist an OBI time series as OHLC-style bars to `metrics_data.json` and render an OBI candlestick chart beneath the current Volume subplot in the Dash UI. - -### 2) Goals -- Compute OBI from the full in-memory aggregated book (all bid/ask levels) on every order book update. -- Aggregate OBI into OHLC-style bars per `window_seconds`. -- Persist OBI bars to `metrics_data.json` with atomic writes and a rolling retention of 1000 rows. -- Add an OBI candlestick subplot (blue-toned) beneath Volume in the main chart, sharing the time axis. -- Throttle intra-window upserts of OBI bars using the same approach/frequency as current OHLC throttling; always write on window close. - -### 3) User Stories -- As a researcher, I want OBI computed from the entire book so I can assess true depth imbalance. -- As an analyst, I want OBI stored per time window as candlesticks so I can compare it with price/volume behavior. -- As a developer, I want raw OBI values so I can analyze absolute imbalance patterns. - -### 4) Functional Requirements -1. Inputs and Definitions - - Compute on every order book update using the complete in-memory book: - - `B = sum(self._book_bids.values())` - - `A = sum(self._book_asks.values())` - - `OBI = B - A` - - Edge case: if both sides are empty → `OBI = 0`. -2. Windowing & Aggregation - - Use the same `window_seconds` boundary as OHLC bars; window anchor is derived from the order book update timestamp. - - Maintain OBI OHLC per window: `obi_open`, `obi_high`, `obi_low`, `obi_close`. - - On window rollover, finalize and persist the bar. -3. Persistence - - Introduce `metrics_data.json` (co-located with other IPC files) with atomic writes. - - Schema: list of fixed-length rows - - `[timestamp_ms, obi_open, obi_high, obi_low, obi_close]` - - Keep only the last 1000 rows. - - Upsert intra-window bars periodically (throttled, matching OHLC’s approach) and always write on window close. -4. Visualization - - Read `metrics_data.json` in the Dash app with the same tolerant JSON reading/caching approach as other IPC files. - - Extend the main figure to a third row for OBI candlesticks beneath Volume, sharing the x-axis. - - Style OBI candlesticks in blue tones (distinct increasing/decreasing shades) and add a zero baseline. -5. Performance & Correctness - - OBI compute happens on every order book update; I/O is throttled to maintain UI responsiveness. - - Use existing logging and error handling patterns; must not crash if metrics JSON is temporarily unreadable. -6. Testing - - Unit tests for OBI on symmetric, empty, and imbalanced books; intra-window aggregation; window rollover. - - Integration test: fixture DB produces `metrics_data.json` aligned with OHLC bars, valid schema/lengths. - -### 5) Non-Goals -- No additional derived metrics; keep only raw OBI values for maximum flexibility. -- No database persistence for metrics; JSON IPC only. -- No strategy/signal changes. - -### 6) Design Considerations -- Reuse `OHLCProcessor` in-memory book (`_book_bids`, `_book_asks`). -- Introduce new metrics IO helpers in `viz_io.py` mirroring existing OHLC IO (atomic write, rolling trim, upsert). -- Keep `metrics_data.json` separate from `ohlc_data.json` to avoid schema churn. - -### 7) Technical Considerations -- Implement OBI compute and aggregation inside `OHLCProcessor.update_orderbook` after applying partial updates. -- Throttle intra-window upserts with the same cadence concept as OHLC; on window close always persist. -- Add a finalize path to persist the last OBI bar. - -### 8) Success Metrics -- `metrics_data.json` present with valid rows during processing. -- OBI subplot updates smoothly and aligns with OHLC window timestamps. -- OBI ≈ 0 for symmetric books; correct sign for imbalanced cases; no noticeable performance regression. - -### 9) Open Questions -- None; cadence confirmed to match OHLC throttling. Styling: blue tones for OBI candlesticks. - - diff --git a/tasks/prd-pyside6-pyqtgraph-migration.md b/tasks/prd-pyside6-pyqtgraph-migration.md deleted file mode 100644 index f61e788..0000000 --- a/tasks/prd-pyside6-pyqtgraph-migration.md +++ /dev/null @@ -1,190 +0,0 @@ -# Product Requirements Document: Migration from Dash/Plotly to PySide6/PyQtGraph - -## Introduction/Overview - -This PRD outlines the complete migration of the orderflow backtest visualization system from the current Dash/Plotly web-based implementation to a native desktop application using PySide6 and PyQtGraph. The migration addresses critical issues with the current implementation including async problems, debugging difficulties, performance bottlenecks, and data handling inefficiencies. - -The goal is to create a robust, high-performance desktop application that provides better control over the codebase, eliminates current visualization bugs (particularly the CVD graph display issue), and enables future real-time trading strategy monitoring capabilities. - -## Goals - -1. **Eliminate Current Technical Issues** - - Resolve async-related problems causing visualization failures - - Fix CVD graph display issues that persist despite correct-looking code - - Enable proper debugging capabilities with breakpoint support - - Improve overall application performance and responsiveness - -2. **Improve Development Experience** - - Gain better control over the codebase through native Python implementation - - Reduce dependency on intermediate file-based data exchange - - Simplify the development and debugging workflow - - Establish a foundation for future real-time capabilities - -3. **Maintain and Enhance Visualization Capabilities** - - Preserve all existing chart types and interactions - - Improve performance for granular dataset handling - - Prepare infrastructure for real-time data streaming - - Enhance user experience through native desktop interface - -## User Stories - -1. **As a trading strategy developer**, I want to visualize OHLC data with volume, OBI, and CVD indicators in a single, synchronized view so that I can analyze market behavior patterns effectively. - -2. **As a data analyst**, I want to zoom, pan, and select specific time ranges on charts so that I can focus on relevant market periods for detailed analysis. - -3. **As a system developer**, I want to debug visualization issues with breakpoints and proper debugging tools so that I can identify and fix problems efficiently. - -4. **As a performance-conscious user**, I want smooth chart rendering and interactions even with large, granular datasets so that my analysis workflow is not interrupted by lag or freezing. - -5. **As a future trading system operator**, I want a foundation that can handle real-time data updates so that I can monitor live trading strategies effectively. - -## Functional Requirements - -### Core Visualization Components - -1. **Main Chart Window** - - The system must display OHLC candlestick charts in a primary plot area - - The system must allow customizable time window selection for OHLC display - - The system must synchronize all chart components to the same time axis - -2. **Integrated Indicator Charts** - - The system must display Volume bars below the OHLC chart - - The system must display Order Book Imbalance (OBI) indicator - - The system must display Cumulative Volume Delta (CVD) indicator - - All indicators must share the same X-axis as the OHLC chart - -3. **Depth Chart Visualization** - - The system must display order book depth at selected time snapshots - - The system must update depth visualization based on time selection - - The system must provide clear bid/ask visualization - -### User Interaction Features - -4. **Chart Navigation** - - The system must support zoom in/out functionality across all charts - - The system must allow panning across time ranges - - The system must provide time range selection capabilities - - The system must support rectangle selection for detailed analysis - -5. **Data Inspection** - - The system must display mouseover information for all chart elements - - The system must show precise values for OHLC, volume, OBI, and CVD data points - - The system must provide crosshair functionality for precise data reading - -### Technical Architecture - -6. **Application Framework** - - The system must be built using PySide6 for the GUI framework - - The system must use PyQtGraph for all chart rendering and interactions - - The system must implement a native desktop application architecture - -7. **Data Integration** - - The system must integrate with existing data processing modules (metrics_calculator, ohlc_processor, orderbook_manager) - - The system must eliminate dependency on intermediate JSON files for data display - - The system must support direct in-memory data transfer between processing and visualization - -8. **Performance Requirements** - - The system must handle granular datasets efficiently without UI blocking - - The system must provide smooth chart interactions (zoom, pan, selection) - - The system must render updates in less than 100ms for typical dataset sizes - -### Development and Debugging - -9. **Code Quality** - - The system must be fully debuggable with standard Python debugging tools - - The system must follow the existing project architecture patterns - - The system must maintain clean separation between data processing and visualization - -## Non-Goals (Out of Scope) - -1. **Web Interface Maintenance** - The existing Dash/Plotly implementation will be completely replaced, not maintained in parallel - -2. **Backward Compatibility** - No requirement to maintain compatibility with existing Dash/Plotly components or web-based deployment - -3. **Multi-Platform Distribution** - Initial focus on development environment only, not packaging for distribution - -4. **Real-Time Implementation** - While the architecture should support future real-time capabilities, the initial migration will focus on historical data visualization - -5. **Advanced Chart Types** - Only migrate existing chart types; new visualization features are out of scope for this migration - -## Design Considerations - -### User Interface Layout -- **Main Window Structure**: Primary chart area with integrated indicators below -- **Control Panel**: Side panel or toolbar for time range selection and chart configuration -- **Status Bar**: Display current data range, loading status, and performance metrics -- **Menu System**: File operations, view options, and application settings - -### PyQtGraph Integration -- **Plot Organization**: Use PyQtGraph's PlotWidget for main charts with linked axes -- **Custom Plot Items**: Implement custom plot items for OHLC candlesticks and depth visualization -- **Performance Optimization**: Utilize PyQtGraph's fast plotting capabilities for large datasets - -### Data Flow Architecture -- **Direct Memory Access**: Replace JSON file intermediates with direct Python object passing -- **Lazy Loading**: Implement efficient data loading strategies for large time ranges -- **Caching Strategy**: Cache processed data to improve navigation performance - -## Technical Considerations - -### Dependencies and Integration -- **PySide6**: Main GUI framework, provides native desktop capabilities -- **PyQtGraph**: High-performance plotting library, optimized for real-time data -- **Existing Modules**: Maintain integration with metrics_calculator.py, ohlc_processor.py, orderbook_manager.py -- **Database Integration**: Continue using existing SQLite database through db_interpreter.py - -### Migration Strategy (Iterative Implementation) -- **Phase 1**: Basic PySide6 window with single PyQtGraph plot -- **Phase 2**: OHLC candlestick chart implementation -- **Phase 3**: Volume, OBI, and CVD indicator integration -- **Phase 4**: Depth chart implementation -- **Phase 5**: User interaction features (zoom, pan, selection) -- **Phase 6**: Data integration and performance optimization - -### Performance Considerations -- **Memory Management**: Efficient data structure handling for large datasets -- **Rendering Optimization**: Use PyQtGraph's ViewBox and plotting optimizations -- **Thread Safety**: Proper handling of data processing in background threads -- **Resource Cleanup**: Proper cleanup of chart objects and data structures - -## Success Metrics - -### Technical Success Criteria -1. **Bug Resolution**: CVD graph displays correctly and all existing visualization bugs are resolved -2. **Performance Improvement**: Chart interactions respond within 100ms for typical datasets -3. **Debugging Capability**: Developers can set breakpoints and debug visualization code effectively -4. **Data Handling**: Elimination of intermediate JSON files reduces data transfer overhead by 50% - -### User Experience Success Criteria -1. **Feature Parity**: All existing chart types and interactions are preserved and functional -2. **Responsiveness**: Application feels more responsive than the current Dash implementation -3. **Stability**: No crashes or freezing during normal chart operations -4. **Visual Quality**: Charts render clearly with proper scaling and anti-aliasing - -### Development Success Criteria -1. **Code Maintainability**: New codebase follows established project patterns and is easier to maintain -2. **Development Velocity**: Future visualization features can be implemented more quickly -3. **Testing Capability**: Comprehensive testing can be performed with proper debugging tools -4. **Architecture Foundation**: System is ready for future real-time data integration - -## Open Questions - -1. **Data Loading Strategy**: Should we implement progressive loading for very large datasets, or rely on existing data chunking mechanisms? - -2. **Configuration Management**: How should chart configuration and user preferences be stored and managed in the desktop application? - -3. **Error Handling**: What specific error handling and user feedback mechanisms should be implemented for data loading and processing failures? - -4. **Performance Monitoring**: Should we include built-in performance monitoring and profiling tools in the application? - -5. **Future Real-Time Integration**: What specific interface patterns should be established now to facilitate future real-time data streaming integration? - -## Implementation Approach - -This migration will follow the iterative development workflow with explicit approval checkpoints between each phase. Each implementation phase will be: -- Limited to manageable scope (≤250 lines per module) -- Tested immediately after implementation -- Integrated with existing data processing modules -- Validated for performance and functionality before proceeding to the next phase - -The implementation will begin with basic PySide6 application structure and progressively add PyQtGraph visualization capabilities while maintaining integration with the existing data processing pipeline.