Stop Asking for more Presents

Git-Repository: Template Solution Solution-Diff (Solution is posted at 18:00 CET)
Workload: 59 lines of code
Important System-Calls: seccomp(2)

Well, I do not know how to bring up this topic in an acceptable way, but there are some ELFs that are addicted to gifts. They are surrounded by so many gifts, so many temptations, some of them just snap, go nuts, ask for more and more gifts which they then horde in their private cave. As this problem can stay below the radar for a long time, the ELF elders, which want to help those ELFs, decided to create a detection scheme to identify ELFs in need. For this, it is sufficient to detect whether an ELF asks for some specific things from his supervisor which are only used by gift-addicted ELFs for storing them in their cave.


Software has security issues. Always. Every complex piece of software contains bugs and vulnerabilities that might be exploitable by an evil attacker. While it is good and noble to remove bugs from software and to use safer programming languages (like Rust), software will never be perfect. Therefore, it is necessary to tackle the problem also on the architectural level and limit the amount of damage that an attacker can do if they find an exploit. Exploits, like shellcodes, often try to spread in the system by issuing destructive system calls, we could restrict the system call interface for a specific program to a subset of all available operations. Or: why does my in-memory key-value store require more system calls than accept/read/write/close/mmap/munmap?

And this is where seccomp(2) comes into the picture. With this system call, a process restricts its own system call interface to the subset of required operations during its initialization. For example, a key-value store could ban everything besides [accept, read, write, close, mmap, munmap], so exploits cannot issue execve(2) to start a different executable. As the list of allowed system calls cannot be widened again, the kernel enforces a strict system call sandbox.

In the described mode, seccomp can not only filter out forbidden system calls but it can also inspect the arguments. For example, one could allow accept only on a specific file descriptor. However, for this, the user has to write bpf(2) programs as filters. Berkley Packet Filter is an in-kernel virtual machine and was originally designed to filter network packets efficiently. In essence, bpf allows us to execute small programs within the kernel without worrying about security.

Since understanding seccomp and bpf would be a little much for a single day, we will only experiment with the SECCOMP_SET_MODE_STRICT mode of seccomp. In this mode, a process can restrict itself to a very small subset of system calls (read, write, exit, sigreturn(2)). The core idea of today's exercise is to execute a given function within a separate seccomp-protected process, which returns its result via a pipe to the calling process.


Since processes in seccomp's strict mode cannot create new file descriptors but only use existing ones with read() and write(), we have to close all file descriptors but the write-end pipe in our protected child-process. Since Linux gives us no possibility to inspect the file-descriptor table, we would have to iterate over all possible file descriptors and invoke close() on them:

for (int i = 1; i < INT_MAX; i++) 

However, as INT_MAX is usually a large number, Linux learned the close_range(2) system call with 5.9. With this system call, a process can close a whole range of file descriptors, which is very useful for our sandboxing and containerizing use cases.


secure_func_t spawn_secure(void (*func)(void*, int), void* arg)
int complete_secure(secure_func_t f, char *buf, size_t buflen) {

Complete these two functions: 1. spawn_secure forks the current process, installs a seccomp filter, closes all file descriptors but the write-end of the pipe with close_range, and calls the given function. 2. complete_secure reads from the read-end into buf and waits for the child to complete.

The output of the program should look like:

$ ./seccomp 
ok: Hallo
fail failed: -1