The Endless Advent Calendar

Git-Repository: Template Solution Solution-Diff (Solution is posted at 18:00 CET)
Workload: 61 lines of code
Important System-Calls: prctl(2), sigaction(2)

It is finally done. The ELFs loaded all the gifts onto Santa's sled and waved him one last time as he disappeared, completely overloaded, into the winter clouds. A big sigh went through the whole ELF community. "Finally, the old man has gone". The bottles of mulled wine were unpacked, the feet came onto the tables faster than they counted the gifts yesterday, gingerbread was eaten, and the ELFs rolled cigars from the letters of the children. Finally done.

To prevent this story from happening again in the same way next year, the Council of the ELFs decided on that very day that the children should build their own system calls in the future. "If you give them gingerbread, they will be filled for one evening, but if you show them how to build their own system calls, they will be happy forever". Such, or such similar words and speeches were brandished there.

Syscall User Dispatch

With Linux 5.11, the kernel learned a new feature on x86: Syscall User Dispatch, which allows the user to intercept all system calls that originate from a certain thread. Originally, this feature was introduced in Linux for Valve to allow for faster wine emulation. With this feature, wine can install a system call interceptor mechanism to interpret system calls within Windows binaries that are not supported by the Linux kernel directly (most of them) and which cannot be hooked easily.

This interface hides behind the prctl(2) system call, which allows various manipulations of the execution environment of the calling thread, and the PR_SET_SYSCALL_USER_DISPATCH flag:

      code_ptr, length, &flag)

With this call, the calling thread enables the user space syscall dispatcher: Whenever a system call is issued from a place outside of the region [code_ptr, code_ptr+length], the kernel will send a SIGSYS signal to the thread, which then should handle the system call. Furthermore, the prctl() call, installs a pointer to a char-sized flag (char flag) in user space, which allows the user space to enable and disable the filter without issuing a system call (which would be rather unhandy).

Therefore, the following pattern will write Hello World, every time somebody tries to execute a system call:

void usyscall_signal(int signum, siginfo_t *info, void *context) {
   flag = false;
   write(1, "Hello World\n", 12)
   flag = true;

Again, similar to rseq(2), we see that registering a memory region with the kernel allows us to communicate information passively between user space and kernel space without invoking a system call. And I'm sure that this is a pattern that we will see more often in the future.


Hint: Returning from a signal handler requires the system call sigreturn(2), which the glibc invokes in restore_rt.. With a GDB, and the disassemble mechanism, we can also look at this function:

(gdb) disassemble __restore_rt
Dump of assembler code for function __restore_rt:
   0x00007ffff7c3daa0 <+0>: mov    $0xf,%rax
   0x00007ffff7c3daa7 <+7>: syscall 
   0x00007ffff7c3daa9 <+9>: nopl   0x0(%rax)
End of assembler dump.

As our user space syscall dispatcher would intercept this system call (rax=0xf), we would end in an endless loop. Therefore we have to instruct the kernel to exclude this code region from the user space dispatching mechanism. This can be done, by invoking prtctl() a second time in the signal handler to set the ignored region to [&__restore_rt, __restore_rt+9]. As the restore_rt symbol is not exported by the glibc, we have to deduce the address by extracting the address where our signal handler will return to via the GCC instrinsic __builtin_return_address.


At this point, I want to thank you for participating in the System-Call Advent Calendar. Have a great Christmas! Enjoy your time and a Happy New Year!