Select a Gift
Workload: 110 lines of code
Important System-Calls: select(2)
- Dec. 1: The cat on the tip of the iceberg file 45 lines [open(2), pread(2), close(2)]
- Dec. 2: Clone a Chimera! process 45 lines [clone(2), fork(2), getpid(2), gettid(2)]
For programs to execute it is a necessary precondition that all required resources are available. Simple calculations only require the CPU to be available, but more complex operations might need multiple resources: For example, if your program wants to parse a file in the next step, it does not only require time on the CPU to do the parsing but also the file contents must be loaded to memory as our CPU instructions only work on memory (and registers). So, we could draw a dependency graph between resources (CPUs, I/O requests, network messages) and pieces of code. Only if all dependencies (predecessors) are available or have finished, the piece of code can actually execute.
Often, we express such dependencies with synchronous system calls, which block the thread continuation until the system call has completed:
funcA(); read(fd, buf, 4096); funcB(buf);
In this trivial example, the program issues the read(2) system call to express two things:
(1) please read 4096 bytes from the file descriptor
(2) continue the thread, which then will call
funcB() only after the buffer was filled.
So, synchronous system calls promise that the invoked functionality has completed before we continue.
For the second part, if the data is not yet available
read() will block the thread and exclude it from scheduling until the data arrived.
read() we can only wait for data to become available on a single file descriptor.
But sometimes this is not enough: For example, if you are a server process that wants to handle multiple network clients simultaneously you do not know which client socket will provide data first.
Thereby, you do not know on which socket you should invoke
While one solution is to spawn an execution thread for each client that blocks its execution on the client socket, this approach comes with high costs for servers with many clients (many threads, frequent thread switches).
Since Unix did not have proper thread support for a long time, the developers came up with the select(2) system call.
With this blocking syscall, the calling thread expresses: please block my execution until one of the given file descriptors becomes "ready".
Becoming ready means that a
write() call would surely not block.
With this, waiting for the next request in a multi-client server becomes easier: just "select" all client sockets and handle all sockets that became ready:
while True: select(client_connections) for fd in client_connections: if fd.is_ready(): request = fd.read() handle(request)
While this looks quite easy on first sight, it opens a whole new box of problems as we are now in the realm of
event-based programming. For example, not only
read() can block, but also
write() can block if the client cannot receive the answer fast enough. However, for today, we will not think too much about this and only look at the
For more details on using select(2), please look at the man page and also at the tutorial man page for select (select_tut(2) as we have no intention to reiterate that.
Within the kernel,
select() is reduced to the poll(2) infrastructure.
poll() is the newer system call but we will, for now, stick to
In its core, do_select() is rather simple: It iterates over the given file-descriptor set, which is a fixed-sized bit mask, gets the struct file object, and polls it with
vfs_poll() function, which finally calls a concrete file_operations.poll operation.
With today's program we'll put
select() to some use.
You'll write a program that acts as pipe multiplexer filter: After spawning N filter processes, it reads from its standard input and copies its input to the filter processes.
On the filter's output side, our program uses
select() to demultiplex the filter's stdout descriptors into its own stdout. An example output looks like this:
$ make run seq 1 100 | ./select "grep 1[1-3]" "grep [1-3]2" [grep 1[1-3]] Started filter as pid 384425 [grep [1-3]2] Started filter as pid 384426 [grep 1[1-3]] 11 [grep 1[1-3]] 12 [grep 1[1-3]] 13 [grep [1-3]2] 12 [grep [1-3]2] 22 [grep [1-3]2] 32 [grep 1[1-3]] filter exited. exitcode=0 [grep [1-3]2] filter exited. exitcode=0
- The given
start_proc()function uses posix_spawn(3) to start a filter processes. Use it, but also read the man page of
posix_spawn()and be thankful that you do not have to do it yourself with fork(2) and exec(2).
- On the
stdinside you better use a thread that uses blocking system calls to multiplex the data into
- Implement a
drain_proc()function that reads data from a process and prefixes it properly with the given label.
- Test everything without
select()and only a single filter process, before you proceed.
- For each
select, you have to reinitialize the