5.6 KiB
5.6 KiB
ADR-002: JSON File-Based Inter-Process Communication
Status
Accepted
Context
The orderflow backtest system requires communication between the data processing pipeline and the web-based visualization frontend. Key requirements include:
- Real-time data updates from processor to visualization
- Tolerance for timing mismatches between writer and reader
- Simple implementation without external dependencies
- Support for different update frequencies (OHLC bars vs. orderbook depth)
- Graceful handling of process crashes or restarts
Decision
We will use JSON files with atomic write operations for inter-process communication between the data processor and Dash visualization frontend.
Consequences
Positive
- Simplicity: No message queues, sockets, or complex protocols
- Fault tolerance: File-based communication survives process restarts
- Debugging friendly: Data files can be inspected manually
- No dependencies: Built-in JSON support, no external libraries
- Atomic operations: Temp file + rename prevents partial reads
- Language agnostic: Any process can read/write JSON files
- Bounded memory: Rolling data windows prevent unlimited growth
Negative
- File I/O overhead: Disk writes may be slower than in-memory communication
- Polling required: Reader must poll for updates (500ms interval)
- Limited throughput: Not suitable for high-frequency (microsecond) updates
- No acknowledgments: Writer cannot confirm reader has processed data
- File system dependency: Performance varies by storage type
Implementation Details
File Structure
ohlc_data.json # Rolling array of OHLC bars (max 1000)
depth_data.json # Current orderbook depth snapshot
metrics_data.json # Rolling array of OBI/CVD metrics (max 1000)
Atomic Write Pattern
def atomic_write(file_path: Path, data: Any) -> None:
"""Write data atomically to prevent partial reads."""
temp_path = file_path.with_suffix('.tmp')
with open(temp_path, 'w') as f:
json.dump(data, f)
f.flush()
os.fsync(f.fileno())
temp_path.replace(file_path) # Atomic on POSIX systems
Data Formats
# OHLC format: [timestamp_ms, open, high, low, close, volume]
ohlc_data = [
[1640995200000, 50000.0, 50100.0, 49900.0, 50050.0, 125.5],
[1640995260000, 50050.0, 50200.0, 50000.0, 50150.0, 98.3]
]
# Depth format: top-N levels per side
depth_data = {
"bids": [[49990.0, 1.5], [49985.0, 2.1]],
"asks": [[50010.0, 1.2], [50015.0, 1.8]]
}
# Metrics format: [timestamp_ms, obi_open, obi_high, obi_low, obi_close]
metrics_data = [
[1640995200000, 0.15, 0.22, 0.08, 0.18],
[1640995260000, 0.18, 0.25, 0.12, 0.20]
]
Error Handling
# Reader pattern with graceful fallback
try:
with open(data_file) as f:
new_data = json.load(f)
_LAST_DATA = new_data # Cache successful read
except (FileNotFoundError, json.JSONDecodeError) as e:
logging.warning(f"Using cached data: {e}")
new_data = _LAST_DATA # Use cached data
Performance Characteristics
Write Performance
- Small files: < 1MB typical, writes complete in < 10ms
- Atomic operations: Add ~2-5ms overhead for temp file creation
- Throttling: Updates limited to prevent excessive I/O
Read Performance
- Parse time: < 5ms for typical JSON file sizes
- Polling overhead: 500ms interval balances responsiveness and CPU usage
- Error recovery: Cached data eliminates visual glitches
Memory Usage
- Bounded datasets: Max 1000 bars × 6 fields × 8 bytes = ~48KB per file
- JSON overhead: ~2x memory during parsing
- Total footprint: < 500KB for all IPC data
Alternatives Considered
Redis Pub/Sub
- Rejected: Additional service dependency, overkill for simple use case
- Pros: True real-time updates, built-in data structures
- Cons: External dependency, memory overhead, configuration complexity
ZeroMQ
- Rejected: Additional library dependency, more complex than needed
- Pros: High performance, flexible patterns
- Cons: Learning curve, binary dependency, networking complexity
Named Pipes/Unix Sockets
- Rejected: Platform-specific, more complex error handling
- Pros: Better performance, no file I/O
- Cons: Platform limitations, harder debugging, process lifetime coupling
SQLite as Message Queue
- Rejected: Overkill for simple data exchange
- Pros: ACID transactions, complex queries possible
- Cons: Schema management, locking considerations, overhead
HTTP API
- Rejected: Too much overhead for local communication
- Pros: Standard protocol, language agnostic
- Cons: Network stack overhead, port management, authentication
Future Considerations
Scalability Limits
Current approach suitable for:
- Update frequencies: 1-10 Hz
- Data volumes: < 10MB total
- Process counts: 1 writer, few readers
Migration Path
If performance becomes insufficient:
- Phase 1: Add compression (gzip) to reduce I/O
- Phase 2: Implement shared memory for high-frequency data
- Phase 3: Consider message queue for complex routing
- Phase 4: Migrate to streaming protocol for real-time requirements
Monitoring
Track these metrics to validate the approach:
- File write latency and frequency
- JSON parse times in visualization
- Error rates for partial reads
- Memory usage growth over time
Review Triggers
Reconsider this decision if:
- Update frequency requirements exceed 10 Hz
- File I/O becomes a performance bottleneck
- Multiple visualization clients need the same data
- Complex message routing becomes necessary
- Platform portability becomes a concern