- Removed the existing `validation.py` file and replaced it with a modular structure, introducing separate files for validation results, field validators, and the base validator class. - Implemented comprehensive validation functions for common data types, enhancing reusability and maintainability. - Added a new `__init__.py` to expose the validation utilities, ensuring a clean public interface. - Created detailed documentation for the validation module, including usage examples and architectural details. - Introduced extensive unit tests to cover the new validation framework, ensuring reliability and preventing regressions. These changes enhance the overall architecture of the data validation module, making it more scalable and easier to manage.
194 lines
5.5 KiB
Markdown
194 lines
5.5 KiB
Markdown
# Data Validation Module
|
|
|
|
## Purpose
|
|
The data validation module provides a robust, extensible framework for validating market data across different exchanges. It ensures data consistency, type safety, and business rule compliance through a modular validation system.
|
|
|
|
## Architecture
|
|
|
|
### Package Structure
|
|
```
|
|
data/common/validation/
|
|
├── __init__.py # Public interface
|
|
├── result.py # Validation result classes
|
|
├── field_validators.py # Individual field validators
|
|
└── base.py # BaseDataValidator class
|
|
```
|
|
|
|
### Core Components
|
|
|
|
#### ValidationResult
|
|
Represents the outcome of validating a single field or component:
|
|
```python
|
|
ValidationResult(
|
|
is_valid: bool, # Whether validation passed
|
|
errors: List[str] = [], # Error messages
|
|
warnings: List[str] = [], # Warning messages
|
|
sanitized_data: Any = None # Cleaned/normalized data
|
|
)
|
|
```
|
|
|
|
#### DataValidationResult
|
|
Represents the outcome of validating a complete data structure:
|
|
```python
|
|
DataValidationResult(
|
|
is_valid: bool,
|
|
errors: List[str],
|
|
warnings: List[str],
|
|
sanitized_data: Optional[Dict[str, Any]] = None
|
|
)
|
|
```
|
|
|
|
#### BaseDataValidator
|
|
Abstract base class providing common validation patterns for exchange-specific implementations:
|
|
```python
|
|
class BaseDataValidator(ABC):
|
|
def __init__(self, exchange_name: str, component_name: str, logger: Optional[Logger])
|
|
|
|
@abstractmethod
|
|
def validate_symbol_format(self, symbol: str) -> ValidationResult
|
|
|
|
@abstractmethod
|
|
def validate_websocket_message(self, message: Dict[str, Any]) -> DataValidationResult
|
|
```
|
|
|
|
### Field Validators
|
|
Common validation functions for market data fields:
|
|
- `validate_price()`: Price value validation
|
|
- `validate_size()`: Size/quantity validation
|
|
- `validate_volume()`: Volume validation
|
|
- `validate_trade_side()`: Trade side validation
|
|
- `validate_timestamp()`: Timestamp validation
|
|
- `validate_trade_id()`: Trade ID validation
|
|
- `validate_symbol_match()`: Symbol matching validation
|
|
- `validate_required_fields()`: Required field presence validation
|
|
|
|
## Usage Examples
|
|
|
|
### Creating an Exchange-Specific Validator
|
|
```python
|
|
from data.common.validation import BaseDataValidator, ValidationResult
|
|
|
|
class OKXDataValidator(BaseDataValidator):
|
|
def __init__(self, component_name: str = "okx_data_validator", logger = None):
|
|
super().__init__("okx", component_name, logger)
|
|
self._symbol_pattern = re.compile(r'^[A-Z0-9]+-[A-Z0-9]+$')
|
|
|
|
def validate_symbol_format(self, symbol: str) -> ValidationResult:
|
|
errors = []
|
|
warnings = []
|
|
|
|
if not isinstance(symbol, str):
|
|
errors.append(f"Symbol must be string, got {type(symbol)}")
|
|
return ValidationResult(False, errors, warnings)
|
|
|
|
if not self._symbol_pattern.match(symbol):
|
|
errors.append(f"Invalid symbol format: {symbol}")
|
|
|
|
return ValidationResult(len(errors) == 0, errors, warnings)
|
|
```
|
|
|
|
### Validating Trade Data
|
|
```python
|
|
def validate_trade(validator: BaseDataValidator, trade_data: Dict[str, Any]) -> None:
|
|
result = validator.validate_trade_data(trade_data)
|
|
|
|
if not result.is_valid:
|
|
raise ValidationError(f"Trade validation failed: {result.errors}")
|
|
|
|
if result.warnings:
|
|
logger.warning(f"Trade validation warnings: {result.warnings}")
|
|
|
|
return result.sanitized_data
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Validation Constants
|
|
The module defines several constants for validation rules:
|
|
```python
|
|
MIN_PRICE = Decimal('0.00000001')
|
|
MAX_PRICE = Decimal('1000000000')
|
|
MIN_SIZE = Decimal('0.00000001')
|
|
MAX_SIZE = Decimal('1000000000')
|
|
MIN_TIMESTAMP = 946684800000 # 2000-01-01
|
|
MAX_TIMESTAMP = 32503680000000 # 3000-01-01
|
|
VALID_TRADE_SIDES = {'buy', 'sell'}
|
|
```
|
|
|
|
### Regular Expression Patterns
|
|
```python
|
|
NUMERIC_PATTERN = re.compile(r'^-?\d*\.?\d+$')
|
|
TRADE_ID_PATTERN = re.compile(r'^[\w-]+$')
|
|
```
|
|
|
|
## Testing
|
|
|
|
### Running Tests
|
|
```bash
|
|
pytest tests/test_data_validation.py -v
|
|
```
|
|
|
|
### Test Coverage
|
|
The validation module has comprehensive test coverage including:
|
|
- Basic validation result functionality
|
|
- Field validator functions
|
|
- Base validator class
|
|
- Exchange-specific validator implementations
|
|
- Error handling and edge cases
|
|
|
|
## Dependencies
|
|
- Internal:
|
|
- `data.common.data_types`
|
|
- `data.base_collector`
|
|
- External:
|
|
- `typing`
|
|
- `decimal`
|
|
- `logging`
|
|
- `abc`
|
|
|
|
## Error Handling
|
|
|
|
### Common Validation Errors
|
|
- Invalid data type
|
|
- Value out of bounds
|
|
- Missing required fields
|
|
- Invalid format
|
|
- Symbol mismatch
|
|
|
|
### Error Response Format
|
|
```python
|
|
{
|
|
'is_valid': False,
|
|
'errors': ['Price must be positive', 'Size exceeds maximum'],
|
|
'warnings': ['Price below recommended minimum'],
|
|
'sanitized_data': None
|
|
}
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### Implementing New Validators
|
|
1. Extend `BaseDataValidator`
|
|
2. Implement required abstract methods
|
|
3. Add exchange-specific validation rules
|
|
4. Reuse common field validators
|
|
5. Add comprehensive tests
|
|
|
|
### Validation Guidelines
|
|
- Always sanitize input data
|
|
- Include helpful error messages
|
|
- Use warnings for non-critical issues
|
|
- Maintain type safety
|
|
- Log validation failures appropriately
|
|
|
|
## Known Issues and Limitations
|
|
- Timestamp validation assumes millisecond precision
|
|
- Trade ID format is loosely validated
|
|
- Some exchanges may require custom numeric precision
|
|
|
|
## Future Improvements
|
|
- Add support for custom validation rules
|
|
- Implement async validation methods
|
|
- Add validation rule configuration system
|
|
- Enhance performance for high-frequency validation
|
|
- Add more exchange-specific validators |