TCPDashboard/docs/modules/validation.md
Ajasra 96ee25bd01 Refactor data validation module for improved modularity and functionality
- Removed the existing `validation.py` file and replaced it with a modular structure, introducing separate files for validation results, field validators, and the base validator class.
- Implemented comprehensive validation functions for common data types, enhancing reusability and maintainability.
- Added a new `__init__.py` to expose the validation utilities, ensuring a clean public interface.
- Created detailed documentation for the validation module, including usage examples and architectural details.
- Introduced extensive unit tests to cover the new validation framework, ensuring reliability and preventing regressions.

These changes enhance the overall architecture of the data validation module, making it more scalable and easier to manage.
2025-06-07 12:31:47 +08:00

5.5 KiB

Data Validation Module

Purpose

The data validation module provides a robust, extensible framework for validating market data across different exchanges. It ensures data consistency, type safety, and business rule compliance through a modular validation system.

Architecture

Package Structure

data/common/validation/
├── __init__.py      # Public interface
├── result.py        # Validation result classes
├── field_validators.py  # Individual field validators
└── base.py         # BaseDataValidator class

Core Components

ValidationResult

Represents the outcome of validating a single field or component:

ValidationResult(
    is_valid: bool,          # Whether validation passed
    errors: List[str] = [],  # Error messages
    warnings: List[str] = [], # Warning messages
    sanitized_data: Any = None # Cleaned/normalized data
)

DataValidationResult

Represents the outcome of validating a complete data structure:

DataValidationResult(
    is_valid: bool,
    errors: List[str],
    warnings: List[str],
    sanitized_data: Optional[Dict[str, Any]] = None
)

BaseDataValidator

Abstract base class providing common validation patterns for exchange-specific implementations:

class BaseDataValidator(ABC):
    def __init__(self, exchange_name: str, component_name: str, logger: Optional[Logger])
    
    @abstractmethod
    def validate_symbol_format(self, symbol: str) -> ValidationResult
    
    @abstractmethod
    def validate_websocket_message(self, message: Dict[str, Any]) -> DataValidationResult

Field Validators

Common validation functions for market data fields:

  • validate_price(): Price value validation
  • validate_size(): Size/quantity validation
  • validate_volume(): Volume validation
  • validate_trade_side(): Trade side validation
  • validate_timestamp(): Timestamp validation
  • validate_trade_id(): Trade ID validation
  • validate_symbol_match(): Symbol matching validation
  • validate_required_fields(): Required field presence validation

Usage Examples

Creating an Exchange-Specific Validator

from data.common.validation import BaseDataValidator, ValidationResult

class OKXDataValidator(BaseDataValidator):
    def __init__(self, component_name: str = "okx_data_validator", logger = None):
        super().__init__("okx", component_name, logger)
        self._symbol_pattern = re.compile(r'^[A-Z0-9]+-[A-Z0-9]+$')
    
    def validate_symbol_format(self, symbol: str) -> ValidationResult:
        errors = []
        warnings = []
        
        if not isinstance(symbol, str):
            errors.append(f"Symbol must be string, got {type(symbol)}")
            return ValidationResult(False, errors, warnings)
        
        if not self._symbol_pattern.match(symbol):
            errors.append(f"Invalid symbol format: {symbol}")
        
        return ValidationResult(len(errors) == 0, errors, warnings)

Validating Trade Data

def validate_trade(validator: BaseDataValidator, trade_data: Dict[str, Any]) -> None:
    result = validator.validate_trade_data(trade_data)
    
    if not result.is_valid:
        raise ValidationError(f"Trade validation failed: {result.errors}")
    
    if result.warnings:
        logger.warning(f"Trade validation warnings: {result.warnings}")
        
    return result.sanitized_data

Configuration

Validation Constants

The module defines several constants for validation rules:

MIN_PRICE = Decimal('0.00000001')
MAX_PRICE = Decimal('1000000000')
MIN_SIZE = Decimal('0.00000001')
MAX_SIZE = Decimal('1000000000')
MIN_TIMESTAMP = 946684800000  # 2000-01-01
MAX_TIMESTAMP = 32503680000000  # 3000-01-01
VALID_TRADE_SIDES = {'buy', 'sell'}

Regular Expression Patterns

NUMERIC_PATTERN = re.compile(r'^-?\d*\.?\d+$')
TRADE_ID_PATTERN = re.compile(r'^[\w-]+$')

Testing

Running Tests

pytest tests/test_data_validation.py -v

Test Coverage

The validation module has comprehensive test coverage including:

  • Basic validation result functionality
  • Field validator functions
  • Base validator class
  • Exchange-specific validator implementations
  • Error handling and edge cases

Dependencies

  • Internal:
    • data.common.data_types
    • data.base_collector
  • External:
    • typing
    • decimal
    • logging
    • abc

Error Handling

Common Validation Errors

  • Invalid data type
  • Value out of bounds
  • Missing required fields
  • Invalid format
  • Symbol mismatch

Error Response Format

{
    'is_valid': False,
    'errors': ['Price must be positive', 'Size exceeds maximum'],
    'warnings': ['Price below recommended minimum'],
    'sanitized_data': None
}

Best Practices

Implementing New Validators

  1. Extend BaseDataValidator
  2. Implement required abstract methods
  3. Add exchange-specific validation rules
  4. Reuse common field validators
  5. Add comprehensive tests

Validation Guidelines

  • Always sanitize input data
  • Include helpful error messages
  • Use warnings for non-critical issues
  • Maintain type safety
  • Log validation failures appropriately

Known Issues and Limitations

  • Timestamp validation assumes millisecond precision
  • Trade ID format is loosely validated
  • Some exchanges may require custom numeric precision

Future Improvements

  • Add support for custom validation rules
  • Implement async validation methods
  • Add validation rule configuration system
  • Enhance performance for high-frequency validation
  • Add more exchange-specific validators