orderflow_backtest/docs/modules/level_parser.md

102 lines
3.1 KiB
Markdown

# Module: level_parser
## Purpose
The `level_parser` module provides utilities for parsing and normalizing orderbook level data from various string formats. It handles JSON and Python literal representations, converting them into standardized numeric tuples for processing.
## Public Interface
### Functions
- `normalize_levels(levels: Any) -> List[List[float]]`: Parse levels into [[price, size], ...] format, filtering out zero/negative sizes
- `parse_levels_including_zeros(levels: Any) -> List[Tuple[float, float]]`: Parse levels preserving zero sizes for deletion operations
### Private Functions
- `_parse_string_to_list(levels: Any) -> List[Any]`: Core parsing logic trying JSON first, then literal_eval
- `_extract_price_size(item: Any) -> Tuple[Any, Any]`: Extract price/size from dict or list/tuple formats
## Usage Examples
```python
from level_parser import normalize_levels, parse_levels_including_zeros
# Parse standard levels (filters zeros)
levels = normalize_levels('[[50000.0, 1.5], [49999.0, 2.0]]')
# Returns: [[50000.0, 1.5], [49999.0, 2.0]]
# Parse with zero sizes preserved (for deletions)
updates = parse_levels_including_zeros('[[50000.0, 0.0], [49999.0, 1.5]]')
# Returns: [(50000.0, 0.0), (49999.0, 1.5)]
# Supports dict format
dict_levels = normalize_levels('[{"price": 50000.0, "size": 1.5}]')
# Returns: [[50000.0, 1.5]]
# Short key format
short_levels = normalize_levels('[{"p": 50000.0, "s": 1.5}]')
# Returns: [[50000.0, 1.5]]
```
## Dependencies
### External
- `json`: Primary parsing method for level data
- `ast.literal_eval`: Fallback parsing for Python literal formats
- `logging`: Debug logging for parsing issues
- `typing`: Type annotations
## Input Formats Supported
### JSON Array Format
```json
[[50000.0, 1.5], [49999.0, 2.0]]
```
### Dict Format (Full Keys)
```json
[{"price": 50000.0, "size": 1.5}, {"price": 49999.0, "size": 2.0}]
```
### Dict Format (Short Keys)
```json
[{"p": 50000.0, "s": 1.5}, {"p": 49999.0, "s": 2.0}]
```
### Python Literal Format
```python
"[(50000.0, 1.5), (49999.0, 2.0)]"
```
## Error Handling
- **Graceful Degradation**: Returns empty list on parse failures
- **Data Validation**: Filters out invalid price/size pairs
- **Type Safety**: Converts all values to float before processing
- **Debug Logging**: Logs warnings for malformed input without crashing
## Performance Characteristics
- **Fast Path**: JSON parsing prioritized for performance
- **Fallback Support**: ast.literal_eval as backup for edge cases
- **Memory Efficient**: Processes items iteratively, not loading entire dataset
- **Validation**: Minimal overhead with early filtering of invalid data
## Testing
```bash
uv run pytest test_level_parser.py -v
```
Test coverage includes:
- JSON format parsing accuracy
- Dict format (both key styles) parsing
- Python literal fallback parsing
- Zero size preservation vs filtering
- Error handling for malformed input
- Type conversion edge cases
## Known Limitations
- Assumes well-formed numeric data (price/size as numbers)
- Does not validate economic constraints (e.g., positive prices)
- Limited to list/dict input formats
- No support for streaming/incremental parsing