Monday, 30 January 2012

Triple Buffering as a Concurrency Mechanism (II)

As a commentor on Reddit has pointed out, this implementation is flawed. The following steps illustrate how one of the buffers change be orphaned (0, 1 and 2 refer to the indices within the buffer array):

MacroLine tmp1tmp2CleanDirtySnap
(initial) 012
TRIPLE_BUFFER_FLIP_WRITERtmp1 = clean 0012
TRIPLE_BUFFER_NEW_SNAPtmp2 = clean 00012
TRIPLE_BUFFER_NEW_SNAPclean = snap 00212
TRIPLE_BUFFER_NEW_SNAPsnap = tmp2 00210
TRIPLE_BUFFER_FLIP_WRITERclean = dirty 00110
TRIPLE_BUFFER_FLIP_WRITERdirty = tmp1 0100

...and we can no longer access buffer two. The problem comes from trying to order two sets of three operations so that pretty much any combination leaves us in a valid state. My revised solution (which will fix the problem of double-snapping in the first post) will use a form of Optimistic Concurrency Control - basically in the two main macros we take a copy of the pointers, do the relevant swap and then atomically overwrite the three pointers only if nothing else has changed them since we took our copy. As three pointers are too big for a single atomic operation, we'll compact them down into three indices into the buffer array, encoded in three pairs of bits in an int:

/* bit flags are (unused) (new write) (2x dirty) (2x clean) (2x snap) */
#define TRIPLE_BUFFER_TYPE(TYPENAME, TYPE) \
  struct TYPENAME { \
    TYPE buffer[3]; \
    volatile uint_fast8_t flags; \
  }

Note that they are now volatile too - we don't want the compiler to optimise out checks to see if they've changed. I've also moved the to the end of the struct - I'm beginning to think about alignment and I'm assuming if the struct is aligned in a useful way I don't want to misalign the three buffers by the 8 bits of the flag array. Initialising is a bit more simple:

/* initially dirty = 0, clean = 1 and snap = 2 */
#define TRIPLE_BUFFER_NEW(NAME,TYPENAME) \
  struct TYPENAME NAME; \
  NAME.flags = 0x6;

We have to update our accessors to get the index of the clean or snap buffer out of the flags, and return a pointer to that element of the array:

#define TRIPLE_BUFFER_SNAP_PTR(BUFFER) &BUFFER.buffer[BUFFER.flags & 0x3]
#define TRIPLE_BUFFER_WRITE_PTR(BUFFER) &BUFFER.buffer[(BUFFER.flags & 0x30) >> 4]

Google helps with binary-to-hex conversions. The meat of this problem is still the flip and snap operations. I've added an extra bit flag that is set by the writer and cleared by the reader. The reader can use this to determine if there have been any writes since its last snap, and if not it won't take another snap. This means if the macro is called twice it won't snap and "un-snap" as the previous implementation did:

#define TRIPLE_BUFFER_NEW_SNAP(BUFFER) \
  do { \
    uint_fast8_t flagsNow; \
    uint_fast8_t newFlags; \
    do { \
      flagsNow = BUFFER.flags; \
      if((flagsNow & 0x40)==0) break; \
      newFlags = (flagsNow & 0x30) | ((flagsNow & 0x3) << 2) | ((flagsNow & 0xC) >> 2); \
    } while(!__sync_bool_compare_and_swap(&BUFFER.flags, flagsNow, newFlags)); \
  } while(0)

The I-have-written flag is in the seventh most significant bit (0x40) and the newFlags bit-shifting logic is just building an int by or'ing the extracted indices after shifting them to their new positions. The flip method is similar:

#define TRIPLE_BUFFER_FLIP_WRITER(BUFFER) \
  do { \
    uint_fast8_t flagsNow; \
    uint_fast8_t newFlags; \
    do { \
      flagsNow = BUFFER.flags; \
      newFlags = 0x40 | ((flagsNow & 0xC) << 2) | ((flagsNow & 0x30) >> 2) | (flagsNow & 0x3); \
    } while(!__sync_bool_compare_and_swap(&BUFFER.flags, flagsNow, newFlags)); \
  } while(0)

And you can add the following to the bottom of the unit test to prove the snap-unsnap problem's been fixed:

*TRIPLE_BUFFER_WRITE_PTR(it) = 7;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 8;
TRIPLE_BUFFER_FLIP_WRITER(it);

TRIPLE_BUFFER_NEW_SNAP(it);
assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 8);
TRIPLE_BUFFER_NEW_SNAP(it);
assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 8);

Sunday, 29 January 2012

Triple Buffering as a Concurrency Mechanism

Revised (less buggy) implementation in part II!

Triple Buffering is a way of passing data between a producer and a consumer running at different rates. It ensures that the consumer only sees complete data with minimal lag, but the consumer doesn't expect to see every piece of data. The technique is most commonly used between images produced by a graphics card and a monitor; the graphics card is free to render hundreds of frames per second while the monitor consumes a fixed 60 fps.

This technique is also applicable as a small-scale lock-free concurrency mechanism; many applications consume real-time data but want to operate on fixed snapshots, or alternatively the data-processing operation performed takes longer than the time between each new piece of input data (and missing input data is acceptable). The following C macros declare a generic triple-buffer data structure:

#define TRIPLE_BUFFER_TYPE(TYPENAME, TYPE) \
  struct TYPENAME { \
    TYPE* dirty; \
    TYPE* clean; \
    TYPE* snap; \
    TYPE buffer[3]; \
  }
#define TRIPLE_BUFFER_NEW(NAME,TYPENAME) \
  struct TYPENAME NAME; \
  NAME.dirty = &NAME.buffer[0]; \
  NAME.clean = &NAME.buffer[1]; \
  NAME.snap  = &NAME.buffer[2];

I've made the type declaration a macro so the buffer member of the struct will be the correct size - C++ implementations should use templates. If the producer and consumer are on different threads it makes it less obvious whether I should put the three buffers next to each other in memory. If the three buffers are on the same cache line when the writer writes to one, then this may invalidate the buffers in the cache of the reader's core unnecesarily. The writer needs to access the "dirty" buffer and the reader needs to access the "snap" buffer:

#define TRIPLE_BUFFER_SNAP_PTR(BUFFER) BUFFER.snap
#define TRIPLE_BUFFER_WRITE_PTR(BUFFER) BUFFER.dirty

The "dirty" pointer points to the buffer that the writer is currently writing to, and the "clean" pointer is the buffer with the most recent complete set of data. When the writer has finished modifying the dirty buffer it flips the two pointers with the following macro:

#ifdef _GLIBCXX_ATOMIC_BUILTINS
  #define TRIPLE_BUFFER_FLIP_WRITER(BUFFER) \
    do { \
      void* tmp = BUFFER.clean; \
      while(!__sync_bool_compare_and_swap( \
       &BUFFER.clean, \
       BUFFER.clean, \
       BUFFER.dirty)){}; \
      while(!__sync_bool_compare_and_swap( \
        &BUFFER.dirty, \
        BUFFER.dirty, \
        tmp)){}; \
    } while(0)
#else
  #error no atomic operations available!
#endif

This macro is lock-free (it uses GCC built-ins - it should be changed to use C11 atomic operations when compiler support for them is mature enough). The order of the two operations is important as it guarantees that the clean and snap points are complete, immutable sets of data. The clean pointer is set to the dirty pointer first (as this macro has been called it is assumed that the dirty pointer points to a now-complete set of data). The dirty pointer is then updated - but as the clean pointer is the one that is snapped by the reader only its consistency is important. Note that this macro doesn't memset or otherwise clear the new dirty buffer before returning, so writers should do this themselves if they only intend to write to parts of the buffer.

#ifdef _GLIBCXX_ATOMIC_BUILTINS
  #define TRIPLE_BUFFER_NEW_SNAP(BUFFER) \
    do { \
      void* tmp = BUFFER.clean; \
      while(!__sync_bool_compare_and_swap( \
        &BUFFER.clean, \
        BUFFER.clean, \
        BUFFER.snap)){}; \
      while(!__sync_bool_compare_and_swap( \
        &BUFFER.snap, \ 
        BUFFER.snap, \
        tmp)){}; \
    } while(0)
#else
  #error no atomic operations available!
#endif

The reader should call this macro when they want to take a snapshot of the current clean buffer. As above, this is lock-free and uses GCC built-ins. The order of the two atomic operations is important; switching the clean one first effectively isolates the current clean buffer from the writer, preserving its contents even if the writer continues to write and flip. I assume this method is called by the reader, as between the two atomic operations the snap pointer points to a changing set of data. However, the next atomic operation points the snap to the isolated buffer, and when control flow returns from the macro the snap pointer is pointing at what was the clean buffer when it was called. After calling this macro the clean buffer contains the last snap, so snapping twice without an intervening write-and-flip will just return old data. The triple buffer (in its current form) is therefore only useful if the writer always has higher frequency of writes than the reader has of reads. Also, there's no intrinsic mechanism to detect if a snap has changed. If the reader must know this, then the data must have a sequence number or timestamp (either from its producer or inserted by the writer) which the reader can compare with that of the previous snap.

The below unit test shows a typical use case: a writer streams integers into a triple buffer, and the reader periodically snaps them to do a computation.

TRIPLE_BUFFER_TYPE(TestStruct, int);
TRIPLE_BUFFER_NEW(it, TestStruct);

/* Test 1 */

*TRIPLE_BUFFER_WRITE_PTR(it) = 3;
TRIPLE_BUFFER_FLIP_WRITER(it);

TRIPLE_BUFFER_NEW_SNAP(it);
assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 3);

/* Test 2 */

*TRIPLE_BUFFER_WRITE_PTR(it) = 4;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 5;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 6;
TRIPLE_BUFFER_FLIP_WRITER(it);

TRIPLE_BUFFER_NEW_SNAP(it);

*TRIPLE_BUFFER_WRITE_PTR(it) = 7;
TRIPLE_BUFFER_FLIP_WRITER(it);
*TRIPLE_BUFFER_WRITE_PTR(it) = 8;
TRIPLE_BUFFER_FLIP_WRITER(it);

assert(*TRIPLE_BUFFER_SNAP_PTR(it) == 6);

Please let me know if you find a use for this data structure or have any improvements...

Revised (less buggy) implementation in part II!