TCPDashboard/docs/modules/validation.md
Ajasra 96ee25bd01 Refactor data validation module for improved modularity and functionality
- Removed the existing `validation.py` file and replaced it with a modular structure, introducing separate files for validation results, field validators, and the base validator class.
- Implemented comprehensive validation functions for common data types, enhancing reusability and maintainability.
- Added a new `__init__.py` to expose the validation utilities, ensuring a clean public interface.
- Created detailed documentation for the validation module, including usage examples and architectural details.
- Introduced extensive unit tests to cover the new validation framework, ensuring reliability and preventing regressions.

These changes enhance the overall architecture of the data validation module, making it more scalable and easier to manage.
2025-06-07 12:31:47 +08:00

194 lines
5.5 KiB
Markdown

# Data Validation Module
## Purpose
The data validation module provides a robust, extensible framework for validating market data across different exchanges. It ensures data consistency, type safety, and business rule compliance through a modular validation system.
## Architecture
### Package Structure
```
data/common/validation/
├── __init__.py # Public interface
├── result.py # Validation result classes
├── field_validators.py # Individual field validators
└── base.py # BaseDataValidator class
```
### Core Components
#### ValidationResult
Represents the outcome of validating a single field or component:
```python
ValidationResult(
is_valid: bool, # Whether validation passed
errors: List[str] = [], # Error messages
warnings: List[str] = [], # Warning messages
sanitized_data: Any = None # Cleaned/normalized data
)
```
#### DataValidationResult
Represents the outcome of validating a complete data structure:
```python
DataValidationResult(
is_valid: bool,
errors: List[str],
warnings: List[str],
sanitized_data: Optional[Dict[str, Any]] = None
)
```
#### BaseDataValidator
Abstract base class providing common validation patterns for exchange-specific implementations:
```python
class BaseDataValidator(ABC):
def __init__(self, exchange_name: str, component_name: str, logger: Optional[Logger])
@abstractmethod
def validate_symbol_format(self, symbol: str) -> ValidationResult
@abstractmethod
def validate_websocket_message(self, message: Dict[str, Any]) -> DataValidationResult
```
### Field Validators
Common validation functions for market data fields:
- `validate_price()`: Price value validation
- `validate_size()`: Size/quantity validation
- `validate_volume()`: Volume validation
- `validate_trade_side()`: Trade side validation
- `validate_timestamp()`: Timestamp validation
- `validate_trade_id()`: Trade ID validation
- `validate_symbol_match()`: Symbol matching validation
- `validate_required_fields()`: Required field presence validation
## Usage Examples
### Creating an Exchange-Specific Validator
```python
from data.common.validation import BaseDataValidator, ValidationResult
class OKXDataValidator(BaseDataValidator):
def __init__(self, component_name: str = "okx_data_validator", logger = None):
super().__init__("okx", component_name, logger)
self._symbol_pattern = re.compile(r'^[A-Z0-9]+-[A-Z0-9]+$')
def validate_symbol_format(self, symbol: str) -> ValidationResult:
errors = []
warnings = []
if not isinstance(symbol, str):
errors.append(f"Symbol must be string, got {type(symbol)}")
return ValidationResult(False, errors, warnings)
if not self._symbol_pattern.match(symbol):
errors.append(f"Invalid symbol format: {symbol}")
return ValidationResult(len(errors) == 0, errors, warnings)
```
### Validating Trade Data
```python
def validate_trade(validator: BaseDataValidator, trade_data: Dict[str, Any]) -> None:
result = validator.validate_trade_data(trade_data)
if not result.is_valid:
raise ValidationError(f"Trade validation failed: {result.errors}")
if result.warnings:
logger.warning(f"Trade validation warnings: {result.warnings}")
return result.sanitized_data
```
## Configuration
### Validation Constants
The module defines several constants for validation rules:
```python
MIN_PRICE = Decimal('0.00000001')
MAX_PRICE = Decimal('1000000000')
MIN_SIZE = Decimal('0.00000001')
MAX_SIZE = Decimal('1000000000')
MIN_TIMESTAMP = 946684800000 # 2000-01-01
MAX_TIMESTAMP = 32503680000000 # 3000-01-01
VALID_TRADE_SIDES = {'buy', 'sell'}
```
### Regular Expression Patterns
```python
NUMERIC_PATTERN = re.compile(r'^-?\d*\.?\d+$')
TRADE_ID_PATTERN = re.compile(r'^[\w-]+$')
```
## Testing
### Running Tests
```bash
pytest tests/test_data_validation.py -v
```
### Test Coverage
The validation module has comprehensive test coverage including:
- Basic validation result functionality
- Field validator functions
- Base validator class
- Exchange-specific validator implementations
- Error handling and edge cases
## Dependencies
- Internal:
- `data.common.data_types`
- `data.base_collector`
- External:
- `typing`
- `decimal`
- `logging`
- `abc`
## Error Handling
### Common Validation Errors
- Invalid data type
- Value out of bounds
- Missing required fields
- Invalid format
- Symbol mismatch
### Error Response Format
```python
{
'is_valid': False,
'errors': ['Price must be positive', 'Size exceeds maximum'],
'warnings': ['Price below recommended minimum'],
'sanitized_data': None
}
```
## Best Practices
### Implementing New Validators
1. Extend `BaseDataValidator`
2. Implement required abstract methods
3. Add exchange-specific validation rules
4. Reuse common field validators
5. Add comprehensive tests
### Validation Guidelines
- Always sanitize input data
- Include helpful error messages
- Use warnings for non-critical issues
- Maintain type safety
- Log validation failures appropriately
## Known Issues and Limitations
- Timestamp validation assumes millisecond precision
- Trade ID format is loosely validated
- Some exchanges may require custom numeric precision
## Future Improvements
- Add support for custom validation rules
- Implement async validation methods
- Add validation rule configuration system
- Enhance performance for high-frequency validation
- Add more exchange-specific validators