Performance Analysis: DisruptorNode vs Plain Node
This document provides a comprehensive analysis of the performance characteristics of DisruptorNode (Disruptor-based, single-threaded processing) versus plain Node (multi-threaded processing) implementations in the Conduit framework.
Executive Summary
| Metric | DisruptorNode | Plain Node |
|---|---|---|
| Throughput | Very high (millions/sec) | Moderate |
| Latency | Consistently low (sub-microsecond) | Variable |
| Scalability | Excellent for producers | Limited by thread contention |
| Synchronization | None needed | Explicit sync required |
| Complexity | Simple handler logic | More complex (thread-safety) |
| Best Use Case | High-performance, low-latency | Independent parallel processing |
Threading Model Comparison
Plain Node (Node2)
When using plain nodes with multiple input sources:
Threading Behavior:
- If Thread A calls intDispatcher.dispatch(42), Thread A executes onEvent1()
- If Thread B calls stringDispatcher.dispatch("Hello"), Thread B executes onEvent2()
- Both handlers can run concurrently on different threads
graph LR
A[Thread A] -->|dispatch int| D1[Int Dispatcher]
B[Thread B] -->|dispatch string| D2[String Dispatcher]
D1 -->|Thread A| E1[onEvent1]
D2 -->|Thread B| E2[onEvent2]
E1 -.concurrent.-> E2
style E1 fill:#ffebee
style E2 fill:#e3f2fd
DisruptorNode (DisruptorNode2)
Threading Behavior: - Events from both dispatchers go into the same ring buffer - A single consumer thread processes all events sequentially - No concurrent execution of event handlers
graph LR
A[Thread A] -->|dispatch int| RB[Ring Buffer]
B[Thread B] -->|dispatch string| RB
RB -->|Single Thread| CT[Consumer Thread]
CT --> E1[onEvent1]
CT --> E2[onEvent2]
style RB fill:#f3e5f5
style CT fill:#e8f5e8
When to Use Each Approach
Use Plain Node When:
- Event handlers are truly independent
- No shared state between handlers
-
Each handler can run in complete isolation
-
CPU-bound processing with available cores
- Heavy computations in each handler
- Multiple CPU cores available for parallel execution
-
Each handler can utilize a full core
-
Maximum throughput is critical
- Want to process events from different sources in parallel
- Can accept the complexity of thread-safety
Example: Image Processing Pipeline
With 2 threads, you can process both events in parallel → 100ms total With DisruptorNode, events are sequential → 200ms total
Use DisruptorNode When:
- Handlers share state
- Multiple handlers access/modify common data
-
Need to coordinate between different event types
-
Order matters
- Need to guarantee event processing order
-
Coordination between different input sources
-
Low latency is critical
- Sub-microsecond latency requirements
- Consistent, predictable latency
-
No lock contention or context switching overhead
-
Event processing is lightweight
- Fast event handlers (microseconds)
- High event rate
- Minimal CPU per event
Example: Trading System
Performance Characteristics
1. Throughput
DisruptorNode
- Lock-free multi-producer, single-consumer design
- Can achieve 10-25 million operations/second on modern hardware
- Multiple threads can publish to ring buffer concurrently with minimal contention
- Single consumer processes events as fast as possible
Benchmark Results:
Plain Node
- Limited by lock contention and synchronization
- Typically 3-8 million operations/second
- Performance degrades with increased thread count
- Context switching overhead reduces throughput
2. Latency
DisruptorNode
Latency Distribution:
Why Low Latency? - No lock contention - No context switches - Cache-friendly ring buffer - Minimal memory allocation - Busy-spin wait strategy
graph LR
A[Producer] -->|CAS| B[Ring Buffer]
B -->|Cache Hit| C[Consumer]
C -->|L1/L2 Cache| D[Process]
style B fill:#f3e5f5
style C fill:#c8e6c9
Plain Node
Latency Distribution:
Why Higher Latency? - Lock acquisition overhead - Context switching - Cache invalidation - Unpredictable blocking
3. CPU-Bound vs I/O-Bound Processing
CPU-Bound Tasks
Scenario: Each event requires heavy computation (100ms)
| Approach | Processing Time | CPU Usage |
|---|---|---|
| Plain Node (2 threads) | 100ms (parallel) | 200% |
| DisruptorNode (1 thread) | 200ms (sequential) | 100% |
Winner: Plain Node (if CPU cores available)
I/O-Bound Tasks
Scenario: Each event waits for I/O (100ms blocking I/O)
| Approach | Processing Time | Issues |
|---|---|---|
| Plain Node | High latency | Thread blocking |
| DisruptorNode | Terrible | Single thread blocks entire pipeline |
Winner: Neither - use async I/O or thread pools instead
Best Practice for I/O:
4. Scalability
DisruptorNode Scalability
Excellent for scaling producers:
graph TB
P1[Producer 1] --> RB[Ring Buffer]
P2[Producer 2] --> RB
P3[Producer 3] --> RB
P4[Producer N] --> RB
RB --> C[Single Consumer]
style RB fill:#f3e5f5
style C fill:#c8e6c9
- Multiple producers can publish concurrently
- CAS operations minimize contention
- Single consumer avoids coordination overhead
Limitation: Single consumer can become bottleneck if processing is slow
Plain Node Scalability
Limited by thread contention:
graph TB
P1[Thread 1] -->|Lock| H1[Handler 1]
P2[Thread 2] -->|Lock| H2[Handler 2]
P3[Thread 3] -->|Lock| H3[Handler 3]
H1 -.contention.-> SS[Shared State]
H2 -.contention.-> SS
H3 -.contention.-> SS
style SS fill:#ffebee
- Each additional thread increases contention
- Lock acquisition becomes bottleneck
- Context switching overhead increases
5. Memory Efficiency
DisruptorNode
Pre-allocated Ring Buffer:
Benefits: - No garbage collection during event processing - Predictable memory usage - Cache-friendly memory layout
Plain Node
Dynamic Allocation: - May allocate synchronization objects - Potential for garbage collection pressure - Less predictable memory usage
Decision Matrix
Use this matrix to choose the right implementation:
| Requirement | Recommended Approach |
|---|---|
| Sub-microsecond latency | DisruptorNode |
| Shared state between handlers | DisruptorNode |
| Event ordering critical | DisruptorNode |
| Independent handlers, CPU-bound | Plain Node |
| Maximum parallel throughput | Plain Node |
| Simple, maintainable code | DisruptorNode |
| Financial trading system | DisruptorNode |
| Real-time analytics | DisruptorNode |
| Batch processing pipeline | Plain Node (if CPU-bound) |
| Event sourcing | DisruptorNode |
Benchmark Code
DisruptorNode Benchmark
Conclusion
Choose DisruptorNode for: - ✅ Low-latency requirements - ✅ High-throughput event streams - ✅ Shared state scenarios - ✅ Ordered event processing - ✅ Simpler, safer code
Choose Plain Node for: - ✅ Truly independent handlers - ✅ CPU-bound, parallelizable work - ✅ Maximum multi-core utilization - ✅ When thread-safety is not a concern
For most applications, DisruptorNode is the better choice due to its superior latency characteristics, simpler programming model, and excellent throughput for typical event processing workloads.