# Data Validation Module ## Purpose The data validation module provides a robust, extensible framework for validating market data across different exchanges. It ensures data consistency, type safety, and business rule compliance through a modular validation system. ## Architecture ### Package Structure ``` data/common/validation/ ├── __init__.py # Public interface ├── result.py # Validation result classes ├── field_validators.py # Individual field validators └── base.py # BaseDataValidator class ``` ### Core Components #### ValidationResult Represents the outcome of validating a single field or component: ```python ValidationResult( is_valid: bool, # Whether validation passed errors: List[str] = [], # Error messages warnings: List[str] = [], # Warning messages sanitized_data: Any = None # Cleaned/normalized data ) ``` #### DataValidationResult Represents the outcome of validating a complete data structure: ```python DataValidationResult( is_valid: bool, errors: List[str], warnings: List[str], sanitized_data: Optional[Dict[str, Any]] = None ) ``` #### BaseDataValidator Abstract base class providing common validation patterns for exchange-specific implementations: ```python class BaseDataValidator(ABC): def __init__(self, exchange_name: str, component_name: str, logger: Optional[Logger]) @abstractmethod def validate_symbol_format(self, symbol: str) -> ValidationResult @abstractmethod def validate_websocket_message(self, message: Dict[str, Any]) -> DataValidationResult ``` ### Field Validators Common validation functions for market data fields: - `validate_price()`: Price value validation - `validate_size()`: Size/quantity validation - `validate_volume()`: Volume validation - `validate_trade_side()`: Trade side validation - `validate_timestamp()`: Timestamp validation - `validate_trade_id()`: Trade ID validation - `validate_symbol_match()`: Symbol matching validation - `validate_required_fields()`: Required field presence validation ## Usage Examples ### Creating an Exchange-Specific Validator ```python from data.common.validation import BaseDataValidator, ValidationResult class OKXDataValidator(BaseDataValidator): def __init__(self, component_name: str = "okx_data_validator", logger = None): super().__init__("okx", component_name, logger) self._symbol_pattern = re.compile(r'^[A-Z0-9]+-[A-Z0-9]+$') def validate_symbol_format(self, symbol: str) -> ValidationResult: errors = [] warnings = [] if not isinstance(symbol, str): errors.append(f"Symbol must be string, got {type(symbol)}") return ValidationResult(False, errors, warnings) if not self._symbol_pattern.match(symbol): errors.append(f"Invalid symbol format: {symbol}") return ValidationResult(len(errors) == 0, errors, warnings) ``` ### Validating Trade Data ```python def validate_trade(validator: BaseDataValidator, trade_data: Dict[str, Any]) -> None: result = validator.validate_trade_data(trade_data) if not result.is_valid: raise ValidationError(f"Trade validation failed: {result.errors}") if result.warnings: logger.warning(f"Trade validation warnings: {result.warnings}") return result.sanitized_data ``` ## Configuration ### Validation Constants The module defines several constants for validation rules: ```python MIN_PRICE = Decimal('0.00000001') MAX_PRICE = Decimal('1000000000') MIN_SIZE = Decimal('0.00000001') MAX_SIZE = Decimal('1000000000') MIN_TIMESTAMP = 946684800000 # 2000-01-01 MAX_TIMESTAMP = 32503680000000 # 3000-01-01 VALID_TRADE_SIDES = {'buy', 'sell'} ``` ### Regular Expression Patterns ```python NUMERIC_PATTERN = re.compile(r'^-?\d*\.?\d+$') TRADE_ID_PATTERN = re.compile(r'^[\w-]+$') ``` ## Testing ### Running Tests ```bash pytest tests/test_data_validation.py -v ``` ### Test Coverage The validation module has comprehensive test coverage including: - Basic validation result functionality - Field validator functions - Base validator class - Exchange-specific validator implementations - Error handling and edge cases ## Dependencies - Internal: - `data.common.data_types` - `data.base_collector` - External: - `typing` - `decimal` - `logging` - `abc` ## Error Handling ### Common Validation Errors - Invalid data type - Value out of bounds - Missing required fields - Invalid format - Symbol mismatch ### Error Response Format ```python { 'is_valid': False, 'errors': ['Price must be positive', 'Size exceeds maximum'], 'warnings': ['Price below recommended minimum'], 'sanitized_data': None } ``` ## Best Practices ### Implementing New Validators 1. Extend `BaseDataValidator` 2. Implement required abstract methods 3. Add exchange-specific validation rules 4. Reuse common field validators 5. Add comprehensive tests ### Validation Guidelines - Always sanitize input data - Include helpful error messages - Use warnings for non-critical issues - Maintain type safety - Log validation failures appropriately ## Known Issues and Limitations - Timestamp validation assumes millisecond precision - Trade ID format is loosely validated - Some exchanges may require custom numeric precision ## Future Improvements - Add support for custom validation rules - Implement async validation methods - Add validation rule configuration system - Enhance performance for high-frequency validation - Add more exchange-specific validators