Simon/orderflow_backtest

Fork 0

Simon Moisy ebf232317c WIP UI rework with qt6

2025-09-10 15:39:16 +08:00

5.6 KiB

Raw Blame History

ADR-002: JSON File-Based Inter-Process Communication

Status

Accepted

Context

The orderflow backtest system requires communication between the data processing pipeline and the web-based visualization frontend. Key requirements include:

Real-time data updates from processor to visualization
Tolerance for timing mismatches between writer and reader
Simple implementation without external dependencies
Support for different update frequencies (OHLC bars vs. orderbook depth)
Graceful handling of process crashes or restarts

Decision

We will use JSON files with atomic write operations for inter-process communication between the data processor and Dash visualization frontend.

Consequences

Positive

Simplicity: No message queues, sockets, or complex protocols
Fault tolerance: File-based communication survives process restarts
Debugging friendly: Data files can be inspected manually
No dependencies: Built-in JSON support, no external libraries
Atomic operations: Temp file + rename prevents partial reads
Language agnostic: Any process can read/write JSON files
Bounded memory: Rolling data windows prevent unlimited growth

Negative

File I/O overhead: Disk writes may be slower than in-memory communication
Polling required: Reader must poll for updates (500ms interval)
Limited throughput: Not suitable for high-frequency (microsecond) updates
No acknowledgments: Writer cannot confirm reader has processed data
File system dependency: Performance varies by storage type

Implementation Details

File Structure

ohlc_data.json     # Rolling array of OHLC bars (max 1000)
depth_data.json    # Current orderbook depth snapshot
metrics_data.json  # Rolling array of OBI/CVD metrics (max 1000)

Atomic Write Pattern

def atomic_write(file_path: Path, data: Any) -> None:
    """Write data atomically to prevent partial reads."""
    temp_path = file_path.with_suffix('.tmp')
    with open(temp_path, 'w') as f:
        json.dump(data, f)
        f.flush()
        os.fsync(f.fileno())
    temp_path.replace(file_path)  # Atomic on POSIX systems

Data Formats

# OHLC format: [timestamp_ms, open, high, low, close, volume]
ohlc_data = [
    [1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
    [1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
]

# Depth format: top-N levels per side
depth_data = {
    "bids": [[49990.0, 1.5], [49985.0, 2.1]],
    "asks": [[50010.0, 1.2], [50015.0, 1.8]]
}

# Metrics format: [timestamp_ms, obi_open, obi_high, obi_low, obi_close]
metrics_data = [
    [1640995200000, 0.15, 0.22, 0.08, 0.18],
    [1640995260000, 0.18, 0.25, 0.12, 0.20]
]

Error Handling

# Reader pattern with graceful fallback
try:
    with open(data_file) as f:
        new_data = json.load(f)
    _LAST_DATA = new_data  # Cache successful read
except (FileNotFoundError, json.JSONDecodeError) as e:
    logging.warning(f"Using cached data: {e}")
    new_data = _LAST_DATA  # Use cached data

Performance Characteristics

Write Performance

Small files: < 1MB typical, writes complete in < 10ms
Atomic operations: Add ~2-5ms overhead for temp file creation
Throttling: Updates limited to prevent excessive I/O

Read Performance

Parse time: < 5ms for typical JSON file sizes
Polling overhead: 500ms interval balances responsiveness and CPU usage
Error recovery: Cached data eliminates visual glitches

Memory Usage

Bounded datasets: Max 1000 bars × 6 fields × 8 bytes = ~48KB per file
JSON overhead: ~2x memory during parsing
Total footprint: < 500KB for all IPC data

Alternatives Considered

Redis Pub/Sub

Rejected: Additional service dependency, overkill for simple use case
Pros: True real-time updates, built-in data structures
Cons: External dependency, memory overhead, configuration complexity

ZeroMQ

Rejected: Additional library dependency, more complex than needed
Pros: High performance, flexible patterns
Cons: Learning curve, binary dependency, networking complexity

Named Pipes/Unix Sockets

Rejected: Platform-specific, more complex error handling
Pros: Better performance, no file I/O
Cons: Platform limitations, harder debugging, process lifetime coupling

SQLite as Message Queue

Rejected: Overkill for simple data exchange
Pros: ACID transactions, complex queries possible
Cons: Schema management, locking considerations, overhead

HTTP API

Rejected: Too much overhead for local communication
Pros: Standard protocol, language agnostic
Cons: Network stack overhead, port management, authentication

Future Considerations

Scalability Limits

Current approach suitable for:

Update frequencies: 1-10 Hz
Data volumes: < 10MB total
Process counts: 1 writer, few readers

Migration Path

If performance becomes insufficient:

Phase 1: Add compression (gzip) to reduce I/O
Phase 2: Implement shared memory for high-frequency data
Phase 3: Consider message queue for complex routing
Phase 4: Migrate to streaming protocol for real-time requirements

Monitoring

Track these metrics to validate the approach:

File write latency and frequency
JSON parse times in visualization
Error rates for partial reads
Memory usage growth over time

Review Triggers

Reconsider this decision if:

Update frequency requirements exceed 10 Hz
File I/O becomes a performance bottleneck
Multiple visualization clients need the same data
Complex message routing becomes necessary
Platform portability becomes a concern

5.6 KiB Raw Blame History Unescape Escape