- Removed the existing `validation.py` file and replaced it with a modular structure, introducing separate files for validation results, field validators, and the base validator class. - Implemented comprehensive validation functions for common data types, enhancing reusability and maintainability. - Added a new `__init__.py` to expose the validation utilities, ensuring a clean public interface. - Created detailed documentation for the validation module, including usage examples and architectural details. - Introduced extensive unit tests to cover the new validation framework, ensuring reliability and preventing regressions. These changes enhance the overall architecture of the data validation module, making it more scalable and easier to manage.
5.5 KiB
5.5 KiB
Data Validation Module
Purpose
The data validation module provides a robust, extensible framework for validating market data across different exchanges. It ensures data consistency, type safety, and business rule compliance through a modular validation system.
Architecture
Package Structure
data/common/validation/
├── __init__.py # Public interface
├── result.py # Validation result classes
├── field_validators.py # Individual field validators
└── base.py # BaseDataValidator class
Core Components
ValidationResult
Represents the outcome of validating a single field or component:
ValidationResult(
is_valid: bool, # Whether validation passed
errors: List[str] = [], # Error messages
warnings: List[str] = [], # Warning messages
sanitized_data: Any = None # Cleaned/normalized data
)
DataValidationResult
Represents the outcome of validating a complete data structure:
DataValidationResult(
is_valid: bool,
errors: List[str],
warnings: List[str],
sanitized_data: Optional[Dict[str, Any]] = None
)
BaseDataValidator
Abstract base class providing common validation patterns for exchange-specific implementations:
class BaseDataValidator(ABC):
def __init__(self, exchange_name: str, component_name: str, logger: Optional[Logger])
@abstractmethod
def validate_symbol_format(self, symbol: str) -> ValidationResult
@abstractmethod
def validate_websocket_message(self, message: Dict[str, Any]) -> DataValidationResult
Field Validators
Common validation functions for market data fields:
validate_price(): Price value validationvalidate_size(): Size/quantity validationvalidate_volume(): Volume validationvalidate_trade_side(): Trade side validationvalidate_timestamp(): Timestamp validationvalidate_trade_id(): Trade ID validationvalidate_symbol_match(): Symbol matching validationvalidate_required_fields(): Required field presence validation
Usage Examples
Creating an Exchange-Specific Validator
from data.common.validation import BaseDataValidator, ValidationResult
class OKXDataValidator(BaseDataValidator):
def __init__(self, component_name: str = "okx_data_validator", logger = None):
super().__init__("okx", component_name, logger)
self._symbol_pattern = re.compile(r'^[A-Z0-9]+-[A-Z0-9]+$')
def validate_symbol_format(self, symbol: str) -> ValidationResult:
errors = []
warnings = []
if not isinstance(symbol, str):
errors.append(f"Symbol must be string, got {type(symbol)}")
return ValidationResult(False, errors, warnings)
if not self._symbol_pattern.match(symbol):
errors.append(f"Invalid symbol format: {symbol}")
return ValidationResult(len(errors) == 0, errors, warnings)
Validating Trade Data
def validate_trade(validator: BaseDataValidator, trade_data: Dict[str, Any]) -> None:
result = validator.validate_trade_data(trade_data)
if not result.is_valid:
raise ValidationError(f"Trade validation failed: {result.errors}")
if result.warnings:
logger.warning(f"Trade validation warnings: {result.warnings}")
return result.sanitized_data
Configuration
Validation Constants
The module defines several constants for validation rules:
MIN_PRICE = Decimal('0.00000001')
MAX_PRICE = Decimal('1000000000')
MIN_SIZE = Decimal('0.00000001')
MAX_SIZE = Decimal('1000000000')
MIN_TIMESTAMP = 946684800000 # 2000-01-01
MAX_TIMESTAMP = 32503680000000 # 3000-01-01
VALID_TRADE_SIDES = {'buy', 'sell'}
Regular Expression Patterns
NUMERIC_PATTERN = re.compile(r'^-?\d*\.?\d+$')
TRADE_ID_PATTERN = re.compile(r'^[\w-]+$')
Testing
Running Tests
pytest tests/test_data_validation.py -v
Test Coverage
The validation module has comprehensive test coverage including:
- Basic validation result functionality
- Field validator functions
- Base validator class
- Exchange-specific validator implementations
- Error handling and edge cases
Dependencies
- Internal:
data.common.data_typesdata.base_collector
- External:
typingdecimalloggingabc
Error Handling
Common Validation Errors
- Invalid data type
- Value out of bounds
- Missing required fields
- Invalid format
- Symbol mismatch
Error Response Format
{
'is_valid': False,
'errors': ['Price must be positive', 'Size exceeds maximum'],
'warnings': ['Price below recommended minimum'],
'sanitized_data': None
}
Best Practices
Implementing New Validators
- Extend
BaseDataValidator - Implement required abstract methods
- Add exchange-specific validation rules
- Reuse common field validators
- Add comprehensive tests
Validation Guidelines
- Always sanitize input data
- Include helpful error messages
- Use warnings for non-critical issues
- Maintain type safety
- Log validation failures appropriately
Known Issues and Limitations
- Timestamp validation assumes millisecond precision
- Trade ID format is loosely validated
- Some exchanges may require custom numeric precision
Future Improvements
- Add support for custom validation rules
- Implement async validation methods
- Add validation rule configuration system
- Enhance performance for high-frequency validation
- Add more exchange-specific validators