# ADR-002: JSON File-Based Inter-Process Communication ## Status Accepted ## Context The orderflow backtest system requires communication between the data processing pipeline and the web-based visualization frontend. Key requirements include: - Real-time data updates from processor to visualization - Tolerance for timing mismatches between writer and reader - Simple implementation without external dependencies - Support for different update frequencies (OHLC bars vs. orderbook depth) - Graceful handling of process crashes or restarts ## Decision We will use JSON files with atomic write operations for inter-process communication between the data processor and Dash visualization frontend. ## Consequences ### Positive - **Simplicity**: No message queues, sockets, or complex protocols - **Fault tolerance**: File-based communication survives process restarts - **Debugging friendly**: Data files can be inspected manually - **No dependencies**: Built-in JSON support, no external libraries - **Atomic operations**: Temp file + rename prevents partial reads - **Language agnostic**: Any process can read/write JSON files - **Bounded memory**: Rolling data windows prevent unlimited growth ### Negative - **File I/O overhead**: Disk writes may be slower than in-memory communication - **Polling required**: Reader must poll for updates (500ms interval) - **Limited throughput**: Not suitable for high-frequency (microsecond) updates - **No acknowledgments**: Writer cannot confirm reader has processed data - **File system dependency**: Performance varies by storage type ## Implementation Details ### File Structure ``` ohlc_data.json # Rolling array of OHLC bars (max 1000) depth_data.json # Current orderbook depth snapshot metrics_data.json # Rolling array of OBI/CVD metrics (max 1000) ``` ### Atomic Write Pattern ```python def atomic_write(file_path: Path, data: Any) -> None: """Write data atomically to prevent partial reads.""" temp_path = file_path.with_suffix('.tmp') with open(temp_path, 'w') as f: json.dump(data, f) f.flush() os.fsync(f.fileno()) temp_path.replace(file_path) # Atomic on POSIX systems ``` ### Data Formats ```python # OHLC format: [timestamp_ms, open, high, low, close, volume] ohlc_data = [ [1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5], [1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3] ] # Depth format: top-N levels per side depth_data = { "bids": [[49990.0, 1.5], [49985.0, 2.1]], "asks": [[50010.0, 1.2], [50015.0, 1.8]] } # Metrics format: [timestamp_ms, obi_open, obi_high, obi_low, obi_close] metrics_data = [ [1640995200000, 0.15, 0.22, 0.08, 0.18], [1640995260000, 0.18, 0.25, 0.12, 0.20] ] ``` ### Error Handling ```python # Reader pattern with graceful fallback try: with open(data_file) as f: new_data = json.load(f) _LAST_DATA = new_data # Cache successful read except (FileNotFoundError, json.JSONDecodeError) as e: logging.warning(f"Using cached data: {e}") new_data = _LAST_DATA # Use cached data ``` ## Performance Characteristics ### Write Performance - **Small files**: < 1MB typical, writes complete in < 10ms - **Atomic operations**: Add ~2-5ms overhead for temp file creation - **Throttling**: Updates limited to prevent excessive I/O ### Read Performance - **Parse time**: < 5ms for typical JSON file sizes - **Polling overhead**: 500ms interval balances responsiveness and CPU usage - **Error recovery**: Cached data eliminates visual glitches ### Memory Usage - **Bounded datasets**: Max 1000 bars × 6 fields × 8 bytes = ~48KB per file - **JSON overhead**: ~2x memory during parsing - **Total footprint**: < 500KB for all IPC data ## Alternatives Considered ### Redis Pub/Sub - **Rejected**: Additional service dependency, overkill for simple use case - **Pros**: True real-time updates, built-in data structures - **Cons**: External dependency, memory overhead, configuration complexity ### ZeroMQ - **Rejected**: Additional library dependency, more complex than needed - **Pros**: High performance, flexible patterns - **Cons**: Learning curve, binary dependency, networking complexity ### Named Pipes/Unix Sockets - **Rejected**: Platform-specific, more complex error handling - **Pros**: Better performance, no file I/O - **Cons**: Platform limitations, harder debugging, process lifetime coupling ### SQLite as Message Queue - **Rejected**: Overkill for simple data exchange - **Pros**: ACID transactions, complex queries possible - **Cons**: Schema management, locking considerations, overhead ### HTTP API - **Rejected**: Too much overhead for local communication - **Pros**: Standard protocol, language agnostic - **Cons**: Network stack overhead, port management, authentication ## Future Considerations ### Scalability Limits Current approach suitable for: - Update frequencies: 1-10 Hz - Data volumes: < 10MB total - Process counts: 1 writer, few readers ### Migration Path If performance becomes insufficient: 1. **Phase 1**: Add compression (gzip) to reduce I/O 2. **Phase 2**: Implement shared memory for high-frequency data 3. **Phase 3**: Consider message queue for complex routing 4. **Phase 4**: Migrate to streaming protocol for real-time requirements ## Monitoring Track these metrics to validate the approach: - File write latency and frequency - JSON parse times in visualization - Error rates for partial reads - Memory usage growth over time ## Review Triggers Reconsider this decision if: - Update frequency requirements exceed 10 Hz - File I/O becomes a performance bottleneck - Multiple visualization clients need the same data - Complex message routing becomes necessary - Platform portability becomes a concern