The ring buffer is a fundamental data structure in Aeron that enables high-performance, lock-free communication between publishers and subscribers. Understanding its design, performance characteristics, and proper usage is critical for building low-latency messaging systems.
What is a Ring Buffer?
A ring buffer (also called a circular buffer) is a fixed-size buffer that wraps around when it reaches the end. In Aeron, ring buffers are used to pass messages between threads with minimal contention and overhead.
// Ring buffer created once at initialization// Size must be power of 2 (e.g., 1024, 2048, 4096)intbufferSize=1024*1024;// 1MBRingBufferbuffer=newManyToOneRingBuffer(newUnsafeBuffer(ByteBuffer.allocateDirect(bufferSize)));
// Pseudo-code of lock-free writelongclaimSequence(){longcurrent=headSequence.get();longnext=current+messageLength;// Atomic CAS - only one thread succeedsif(headSequence.compareAndSet(current,next)){returncurrent;// Success!}// Retry if CAS failed (contention)}
Comparison:
Approach
Contention Handling
Performance
Lock-based
Block waiting threads
Slow (context switches)
Lock-free (CAS)
Retry immediately
Fast (no blocking)
3. Cache-Line Padding
Aeron pads data structures to prevent false sharing:
// Without padding: Thread A and B share cache lineclassCounter{longvalueA;// CPU 1 writes herelongvalueB;// CPU 2 writes here → cache invalidation!}// With padding: Each thread gets own cache lineclassPaddedCounter{longp1,p2,p3,p4,p5,p6,p7;// PaddinglongvalueA;// CPU 1's cache linelongp8,p9,p10,p11,p12,p13;// PaddinglongvalueB;// CPU 2's cache line}
Why it matters:
- Modern CPUs have 64-byte cache lines
- If two threads write to the same cache line, the cache must be invalidated
- Padding ensures each thread works on separate cache lines
4. Memory Barriers
Aeron uses precise memory barriers instead of heavy locks:
// Multiple market data threads publishing to single ring bufferManyToOneRingBufferbuffer=newManyToOneRingBuffer(...);// Producer 1: Price updatesnewThread(()->{while(running){Priceprice=getPriceUpdate();buffer.write(MSG_TYPE_PRICE,priceBuffer,0,priceLength);}}).start();// Producer 2: Order updatesnewThread(()->{while(running){Orderorder=getOrderUpdate();buffer.write(MSG_TYPE_ORDER,orderBuffer,0,orderLength);}}).start();// Producer 3: Trade updatesnewThread(()->{while(running){Tradetrade=getTradeUpdate();buffer.write(MSG_TYPE_TRADE,tradeBuffer,0,tradeLength);}}).start();// Single consumer processes all messages in orderbuffer.read((msgType,buffer,index,length)->{switch(msgType){caseMSG_TYPE_PRICE->processPrice(buffer,index,length);caseMSG_TYPE_ORDER->processOrder(buffer,index,length);caseMSG_TYPE_TRADE->processTrade(buffer,index,length);}});
One-to-One Ring Buffer
For dedicated producer-consumer pairs, use the simpler OneToOneRingBuffer:
// ❌ Wrong: Slow consumer with heavy processingbuffer.read((msgType,buf,index,length)->{// Blocking I/O in consumer!database.save(parseMessage(buf,index,length));// ❌ Too slow!});// ✅ Correct: Fast consumer with async processingExecutorServiceworkers=Executors.newFixedThreadPool(4);buffer.read((msgType,buf,index,length)->{// Copy data quicklybyte[]copy=newbyte[length];buf.getBytes(index,copy);// Process asynchronouslyworkers.submit(()->{database.save(parseMessage(copy));});});
Best practices:
- Keep consumer handler fast (<100ns ideal)
- Copy data and process async if needed
- Batch database writes
- Use multiple consumer threads (if order doesn't matter)
Problem 3: Wrong Buffer Size
Symptom: Either wasted memory or frequent backpressure
// ❌ Wrong: Buffer too small for burst trafficintbufferSize=1024;// Only 1KB!// ❌ Wrong: Buffer unnecessarily largeintbufferSize=1024*1024*1024;// 1GB wasted!// ✅ Correct: Size based on burst capacity neededintmessagesPerBurst=10000;intavgMessageSize=256;// bytesintsafetyMargin=2;// 2x for safetyintbufferSize=nextPowerOf2(messagesPerBurst*avgMessageSize*safetyMargin);// e.g., 8MB
Sizing guidelines:
Use Case
Recommended Size
Low-latency IPC
1-4 MB
Market data feed
8-32 MB
High-throughput logging
32-128 MB
Video streaming
128-512 MB
Problem 4: False Sharing
Symptom: Unexpected slowdown with multiple threads
// ❌ Wrong: Shared mutable state in handlerclassHandler{longcounter=0;// Shared between threads!voidonMessage(MsgTypetype,Bufferbuf,intindex,intlength){counter++;// False sharing!}}// ✅ Correct: Thread-local countersclassHandler{ThreadLocal<Long>counter=ThreadLocal.withInitial(()->0L);voidonMessage(MsgTypetype,Bufferbuf,intindex,intlength){counter.set(counter.get()+1);// Thread-local, no sharing}}
// Calculate based on expected burst sizeintexpectedMsgRate=1_000_000;// 1M msgs/secintburstDurationMs=100;// 100ms burstintavgMsgSize=256;// 256 bytesintrequiredCapacity=(expectedMsgRate/1000)*burstDurationMs*avgMsgSize;// Round up to next power of 2intbufferSize=nextPowerOf2(requiredCapacity*2);// 2x safety
// Check for backpressurelongresult=buffer.write(msgType,data,0,length);if(result<0){metrics.recordBackpressure();if(result==INSUFFICIENT_CAPACITY){logger.warn("Ring buffer full! Consumer too slow or buffer too small");}}// Periodically check buffer utilizationlongcapacity=buffer.capacity();longused=buffer.producerPosition()-buffer.consumerPosition();doubleutilization=(double)used/capacity*100;if(utilization>80){logger.warn("Ring buffer {}% full - approaching backpressure",utilization);}
publicclassMarketDataHandler{privatefinalManyToOneRingBufferbuffer;privatefinalAtomicLongmessagesProcessed=newAtomicLong();publicMarketDataHandler(intbufferSizeMB){intbufferSize=bufferSizeMB*1024*1024;this.buffer=newManyToOneRingBuffer(newUnsafeBuffer(ByteBuffer.allocateDirect(bufferSize)));// Consumer thread - pinned to CPU coreThreadconsumer=newThread(this::consumeMessages);consumer.setName("market-data-consumer");// In production: pin to isolated CPU coreconsumer.start();}// Called by multiple network threadspublicvoidonMarketData(ByteBufferdata,intlength){longresult=buffer.write(MSG_TYPE_MARKET_DATA,newUnsafeBuffer(data),0,length);if(result<0){// Critical data - retry with exponential backoffhandleBackpressure();}}privatevoidconsumeMessages(){while(running){intmessagesRead=buffer.read((msgType,buf,index,length)->{// Fast parsing and processinglongtimestamp=buf.getLong(index);doubleprice=buf.getDouble(index+8);longvolume=buf.getLong(index+16);// Update in-memory order book (fast)orderBook.update(price,volume,timestamp);messagesProcessed.incrementAndGet();});if(messagesRead==0){Thread.onSpinWait();// No messages, spin briefly}}}}
Summary
Ring buffers are the foundation of Aeron's high performance:
✅ Pre-allocated - No runtime allocation, no GC
✅ Lock-free - Uses CAS for coordination
✅ Cache-friendly - Padding prevents false sharing
✅ Zero-copy - Direct memory access
✅ Multi-threaded - Scales to many producers
But require careful usage:
⚠️ Fixed size - Must size for peak load
⚠️ Backpressure - Handle buffer full scenarios
⚠️ Fast consumer - Slow consumer causes backpressure