TCPDashboard/docs/modules/validation.md

194 lines
5.5 KiB
Markdown
Raw Normal View History

# Data Validation Module
## Purpose
The data validation module provides a robust, extensible framework for validating market data across different exchanges. It ensures data consistency, type safety, and business rule compliance through a modular validation system.
## Architecture
### Package Structure
```
data/common/validation/
├── __init__.py # Public interface
├── result.py # Validation result classes
├── field_validators.py # Individual field validators
└── base.py # BaseDataValidator class
```
### Core Components
#### ValidationResult
Represents the outcome of validating a single field or component:
```python
ValidationResult(
is_valid: bool, # Whether validation passed
errors: List[str] = [], # Error messages
warnings: List[str] = [], # Warning messages
sanitized_data: Any = None # Cleaned/normalized data
)
```
#### DataValidationResult
Represents the outcome of validating a complete data structure:
```python
DataValidationResult(
is_valid: bool,
errors: List[str],
warnings: List[str],
sanitized_data: Optional[Dict[str, Any]] = None
)
```
#### BaseDataValidator
Abstract base class providing common validation patterns for exchange-specific implementations:
```python
class BaseDataValidator(ABC):
def __init__(self, exchange_name: str, component_name: str, logger: Optional[Logger])
@abstractmethod
def validate_symbol_format(self, symbol: str) -> ValidationResult
@abstractmethod
def validate_websocket_message(self, message: Dict[str, Any]) -> DataValidationResult
```
### Field Validators
Common validation functions for market data fields:
- `validate_price()`: Price value validation
- `validate_size()`: Size/quantity validation
- `validate_volume()`: Volume validation
- `validate_trade_side()`: Trade side validation
- `validate_timestamp()`: Timestamp validation
- `validate_trade_id()`: Trade ID validation
- `validate_symbol_match()`: Symbol matching validation
- `validate_required_fields()`: Required field presence validation
## Usage Examples
### Creating an Exchange-Specific Validator
```python
from data.common.validation import BaseDataValidator, ValidationResult
class OKXDataValidator(BaseDataValidator):
def __init__(self, component_name: str = "okx_data_validator", logger = None):
super().__init__("okx", component_name, logger)
self._symbol_pattern = re.compile(r'^[A-Z0-9]+-[A-Z0-9]+$')
def validate_symbol_format(self, symbol: str) -> ValidationResult:
errors = []
warnings = []
if not isinstance(symbol, str):
errors.append(f"Symbol must be string, got {type(symbol)}")
return ValidationResult(False, errors, warnings)
if not self._symbol_pattern.match(symbol):
errors.append(f"Invalid symbol format: {symbol}")
return ValidationResult(len(errors) == 0, errors, warnings)
```
### Validating Trade Data
```python
def validate_trade(validator: BaseDataValidator, trade_data: Dict[str, Any]) -> None:
result = validator.validate_trade_data(trade_data)
if not result.is_valid:
raise ValidationError(f"Trade validation failed: {result.errors}")
if result.warnings:
logger.warning(f"Trade validation warnings: {result.warnings}")
return result.sanitized_data
```
## Configuration
### Validation Constants
The module defines several constants for validation rules:
```python
MIN_PRICE = Decimal('0.00000001')
MAX_PRICE = Decimal('1000000000')
MIN_SIZE = Decimal('0.00000001')
MAX_SIZE = Decimal('1000000000')
MIN_TIMESTAMP = 946684800000 # 2000-01-01
MAX_TIMESTAMP = 32503680000000 # 3000-01-01
VALID_TRADE_SIDES = {'buy', 'sell'}
```
### Regular Expression Patterns
```python
NUMERIC_PATTERN = re.compile(r'^-?\d*\.?\d+$')
TRADE_ID_PATTERN = re.compile(r'^[\w-]+$')
```
## Testing
### Running Tests
```bash
pytest tests/test_data_validation.py -v
```
### Test Coverage
The validation module has comprehensive test coverage including:
- Basic validation result functionality
- Field validator functions
- Base validator class
- Exchange-specific validator implementations
- Error handling and edge cases
## Dependencies
- Internal:
- `data.common.data_types`
- `data.base_collector`
- External:
- `typing`
- `decimal`
- `logging`
- `abc`
## Error Handling
### Common Validation Errors
- Invalid data type
- Value out of bounds
- Missing required fields
- Invalid format
- Symbol mismatch
### Error Response Format
```python
{
'is_valid': False,
'errors': ['Price must be positive', 'Size exceeds maximum'],
'warnings': ['Price below recommended minimum'],
'sanitized_data': None
}
```
## Best Practices
### Implementing New Validators
1. Extend `BaseDataValidator`
2. Implement required abstract methods
3. Add exchange-specific validation rules
4. Reuse common field validators
5. Add comprehensive tests
### Validation Guidelines
- Always sanitize input data
- Include helpful error messages
- Use warnings for non-critical issues
- Maintain type safety
- Log validation failures appropriately
## Known Issues and Limitations
- Timestamp validation assumes millisecond precision
- Trade ID format is loosely validated
- Some exchanges may require custom numeric precision
## Future Improvements
- Add support for custom validation rules
- Implement async validation methods
- Add validation rule configuration system
- Enhance performance for high-frequency validation
- Add more exchange-specific validators