194 lines
5.5 KiB
Markdown
194 lines
5.5 KiB
Markdown
|
|
# Data Validation Module
|
||
|
|
|
||
|
|
## Purpose
|
||
|
|
The data validation module provides a robust, extensible framework for validating market data across different exchanges. It ensures data consistency, type safety, and business rule compliance through a modular validation system.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
### Package Structure
|
||
|
|
```
|
||
|
|
data/common/validation/
|
||
|
|
├── __init__.py # Public interface
|
||
|
|
├── result.py # Validation result classes
|
||
|
|
├── field_validators.py # Individual field validators
|
||
|
|
└── base.py # BaseDataValidator class
|
||
|
|
```
|
||
|
|
|
||
|
|
### Core Components
|
||
|
|
|
||
|
|
#### ValidationResult
|
||
|
|
Represents the outcome of validating a single field or component:
|
||
|
|
```python
|
||
|
|
ValidationResult(
|
||
|
|
is_valid: bool, # Whether validation passed
|
||
|
|
errors: List[str] = [], # Error messages
|
||
|
|
warnings: List[str] = [], # Warning messages
|
||
|
|
sanitized_data: Any = None # Cleaned/normalized data
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
#### DataValidationResult
|
||
|
|
Represents the outcome of validating a complete data structure:
|
||
|
|
```python
|
||
|
|
DataValidationResult(
|
||
|
|
is_valid: bool,
|
||
|
|
errors: List[str],
|
||
|
|
warnings: List[str],
|
||
|
|
sanitized_data: Optional[Dict[str, Any]] = None
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
#### BaseDataValidator
|
||
|
|
Abstract base class providing common validation patterns for exchange-specific implementations:
|
||
|
|
```python
|
||
|
|
class BaseDataValidator(ABC):
|
||
|
|
def __init__(self, exchange_name: str, component_name: str, logger: Optional[Logger])
|
||
|
|
|
||
|
|
@abstractmethod
|
||
|
|
def validate_symbol_format(self, symbol: str) -> ValidationResult
|
||
|
|
|
||
|
|
@abstractmethod
|
||
|
|
def validate_websocket_message(self, message: Dict[str, Any]) -> DataValidationResult
|
||
|
|
```
|
||
|
|
|
||
|
|
### Field Validators
|
||
|
|
Common validation functions for market data fields:
|
||
|
|
- `validate_price()`: Price value validation
|
||
|
|
- `validate_size()`: Size/quantity validation
|
||
|
|
- `validate_volume()`: Volume validation
|
||
|
|
- `validate_trade_side()`: Trade side validation
|
||
|
|
- `validate_timestamp()`: Timestamp validation
|
||
|
|
- `validate_trade_id()`: Trade ID validation
|
||
|
|
- `validate_symbol_match()`: Symbol matching validation
|
||
|
|
- `validate_required_fields()`: Required field presence validation
|
||
|
|
|
||
|
|
## Usage Examples
|
||
|
|
|
||
|
|
### Creating an Exchange-Specific Validator
|
||
|
|
```python
|
||
|
|
from data.common.validation import BaseDataValidator, ValidationResult
|
||
|
|
|
||
|
|
class OKXDataValidator(BaseDataValidator):
|
||
|
|
def __init__(self, component_name: str = "okx_data_validator", logger = None):
|
||
|
|
super().__init__("okx", component_name, logger)
|
||
|
|
self._symbol_pattern = re.compile(r'^[A-Z0-9]+-[A-Z0-9]+$')
|
||
|
|
|
||
|
|
def validate_symbol_format(self, symbol: str) -> ValidationResult:
|
||
|
|
errors = []
|
||
|
|
warnings = []
|
||
|
|
|
||
|
|
if not isinstance(symbol, str):
|
||
|
|
errors.append(f"Symbol must be string, got {type(symbol)}")
|
||
|
|
return ValidationResult(False, errors, warnings)
|
||
|
|
|
||
|
|
if not self._symbol_pattern.match(symbol):
|
||
|
|
errors.append(f"Invalid symbol format: {symbol}")
|
||
|
|
|
||
|
|
return ValidationResult(len(errors) == 0, errors, warnings)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Validating Trade Data
|
||
|
|
```python
|
||
|
|
def validate_trade(validator: BaseDataValidator, trade_data: Dict[str, Any]) -> None:
|
||
|
|
result = validator.validate_trade_data(trade_data)
|
||
|
|
|
||
|
|
if not result.is_valid:
|
||
|
|
raise ValidationError(f"Trade validation failed: {result.errors}")
|
||
|
|
|
||
|
|
if result.warnings:
|
||
|
|
logger.warning(f"Trade validation warnings: {result.warnings}")
|
||
|
|
|
||
|
|
return result.sanitized_data
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Validation Constants
|
||
|
|
The module defines several constants for validation rules:
|
||
|
|
```python
|
||
|
|
MIN_PRICE = Decimal('0.00000001')
|
||
|
|
MAX_PRICE = Decimal('1000000000')
|
||
|
|
MIN_SIZE = Decimal('0.00000001')
|
||
|
|
MAX_SIZE = Decimal('1000000000')
|
||
|
|
MIN_TIMESTAMP = 946684800000 # 2000-01-01
|
||
|
|
MAX_TIMESTAMP = 32503680000000 # 3000-01-01
|
||
|
|
VALID_TRADE_SIDES = {'buy', 'sell'}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Regular Expression Patterns
|
||
|
|
```python
|
||
|
|
NUMERIC_PATTERN = re.compile(r'^-?\d*\.?\d+$')
|
||
|
|
TRADE_ID_PATTERN = re.compile(r'^[\w-]+$')
|
||
|
|
```
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
### Running Tests
|
||
|
|
```bash
|
||
|
|
pytest tests/test_data_validation.py -v
|
||
|
|
```
|
||
|
|
|
||
|
|
### Test Coverage
|
||
|
|
The validation module has comprehensive test coverage including:
|
||
|
|
- Basic validation result functionality
|
||
|
|
- Field validator functions
|
||
|
|
- Base validator class
|
||
|
|
- Exchange-specific validator implementations
|
||
|
|
- Error handling and edge cases
|
||
|
|
|
||
|
|
## Dependencies
|
||
|
|
- Internal:
|
||
|
|
- `data.common.data_types`
|
||
|
|
- `data.base_collector`
|
||
|
|
- External:
|
||
|
|
- `typing`
|
||
|
|
- `decimal`
|
||
|
|
- `logging`
|
||
|
|
- `abc`
|
||
|
|
|
||
|
|
## Error Handling
|
||
|
|
|
||
|
|
### Common Validation Errors
|
||
|
|
- Invalid data type
|
||
|
|
- Value out of bounds
|
||
|
|
- Missing required fields
|
||
|
|
- Invalid format
|
||
|
|
- Symbol mismatch
|
||
|
|
|
||
|
|
### Error Response Format
|
||
|
|
```python
|
||
|
|
{
|
||
|
|
'is_valid': False,
|
||
|
|
'errors': ['Price must be positive', 'Size exceeds maximum'],
|
||
|
|
'warnings': ['Price below recommended minimum'],
|
||
|
|
'sanitized_data': None
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Best Practices
|
||
|
|
|
||
|
|
### Implementing New Validators
|
||
|
|
1. Extend `BaseDataValidator`
|
||
|
|
2. Implement required abstract methods
|
||
|
|
3. Add exchange-specific validation rules
|
||
|
|
4. Reuse common field validators
|
||
|
|
5. Add comprehensive tests
|
||
|
|
|
||
|
|
### Validation Guidelines
|
||
|
|
- Always sanitize input data
|
||
|
|
- Include helpful error messages
|
||
|
|
- Use warnings for non-critical issues
|
||
|
|
- Maintain type safety
|
||
|
|
- Log validation failures appropriately
|
||
|
|
|
||
|
|
## Known Issues and Limitations
|
||
|
|
- Timestamp validation assumes millisecond precision
|
||
|
|
- Trade ID format is loosely validated
|
||
|
|
- Some exchanges may require custom numeric precision
|
||
|
|
|
||
|
|
## Future Improvements
|
||
|
|
- Add support for custom validation rules
|
||
|
|
- Implement async validation methods
|
||
|
|
- Add validation rule configuration system
|
||
|
|
- Enhance performance for high-frequency validation
|
||
|
|
- Add more exchange-specific validators
|