There are many ELFs, really PT_LOADs of ELFs, that are all running around, preparing presents, grabbing tools, sewing, glueing, skrewing and what not. And they also have a free-floating workbench policy, where an ELF will use a workbench only for a single task and then release it to be used by other ELFs. Thereby, they can maximize their output as they get away with having less idle work benches. However, this requires coordination as sometimes two ELFs want to use the same workbench for their next task. Luckily, there are some Linux primitives that might help us with this synchronization problem. However, they are rather low level and a better abstraction like a semaphore would be great!
futex - Fast User Space Mutexes
Today, we will look at the important topic of inter-process communication (IPC) and more specifically at the topic of synchronization. How can we as application developers coordinate the work of multiple threads or processes. Of course, you can and usually should use the pthread library for mutexes, barriers, and condition variables. These primitives implement waiting for other threads, either until a mutex is unlocked, a barrier is reached by all of the threads, or when a certain condition is met. But how does the maintainer of that library implement these high-level primitives? To answer this, we will look today at the futex(2) system call, which is the root of all modern synchronization mechanism on Linux.
Described by Franke and Russel, futexes assist us in building synchronization primitives that involve passive waiting. A thread waits actively (sometimes busy waiting), when it itself, in its own CPU time, actively waits for a condition to become true. In contrast to this, passive waiting is a technique to delegate this duty to the kernel while the process is send to a sleep state. In the meantime, while our process sleeps, the kernel can schedule other processes onto the CPU.
With futexes, the Linux kernel takes a minimal approach to passive waiting: Instead of supplying many primitives for all kind of operations (semaphores, condition variables, barriers) as individual system calls, we only have a single system call that implements the minimal necessary semantic within the kernel: (1) With
FUTEX_WAIT, we enqueue the current task into a wait queue, and with (2)
FUTEX_WAKE, we can wakeup N threads from this wait queue.
The reasons why and when threads invoke these primitives is fully up to the user space.
Thereby, the user can implement different high-level abstractions and waiting strategies on top of a single system call; a great example of a well-crafted kernel interface!
int futex(uint32_t *addr, int op, uint32_t val, ....);
Furthermore, the futex interface is very versatile and requires no explicit creation or destruction of in-kernel objects. As we see from the prototype, the futex system call gets a pointer as its first argument: this 32-bit memory cell referenced by this pointer is the futex. So we can wait on any 32-bit word in our address space! But how does this work? Internally, Linux creates the waiting queues on demand and uses a hash table to map the user space address to a specific wait queue. Within each wait queue, threads that wait on different addresses are mixed and the kernel iterates over these lists, filtering out the actually referenced threads.
While futex(2) supports many more variants of wait and wake, we will now have a look at the two most basic variants:
FUTEX_WAKE: Wake up N=
valthreads from the referenced wait queue. If there are not enough waiting threads, nothing happens and no state is recorded.
FUTEX_WAIT: Enqueue the current thread into the waiting queue, if(!) the condition
*addr == valholds true.
The condition for the wait operation is necessary to actually give synchronization guarantees.
For example, if you implement the semaphore (see tasks), you decrement the 32-bit word and sleep if the value reaches zero.
But what happens, when another process increments the value in between decrementing and going to sleep?
Then we must not sleep but would!
So in essence we have to weld the check and the sleep operation together, executing them at the same time (atomically).
And this can be done with the futex system call:
futex(addr, FUTEX_WAIT, 0).
Thereby, we avoid the described lost-wakeup problem.
Today the task is comprised of three parts:
- Create a semaphore implementation on top of the futex system call. For the user space part of the semaphore, you should use atomic instructions.
- Build a bounded buffer implementation that supports
bb_get()and is synchronized with three of your semaphores. If the buffer is full,
bb_putshould block, if the buffer is empty,
- Write a simple test program with two processes (use fork!) that uses the bounded buffer to transfer data from the child process to the parent process. To test your semaphore implementation, let the child process initialize the bounded buffer after waiting for a second.
Decrementing and incrementing counters usually is a non-atomic read-update-write cycle. However, for our semaphore, we require atomic instructions, which are luckily available in the modern C standard:
atomic_int counter; // Retrieve the current counter int current = atomic_load(&counter); // Retrieve the current counter and increment it int prev = atomic_fetch_add(&counter, 1); // Atomically compare the counter and exchange its contents // with 42 if it currently has the value 23 int value = 23; atomic_compare_exchange_strong(&counter, &value, 42)