Let the Children Speak up!

Git-Repository: Template Solution Solution-Diff (Solution is posted at 18:00 CET)
Workload: 155 lines of code
Important System-Calls: sendmsg(2), cmsg(3), unix(7)

As you can imagine, ELFs have a very high opinion of children and their experiences. However, since children are often not yet big enough to reach the tools on the upper shelves, an adult ELF has to fetch the tool from the shelf and then pass them on to the little ones. Unfortunately, little ELFs are so small that even the adult's arm length is often not sufficient and external help is needed to hand over the tool. Only in this way can the little ELFs benefit from the use of the tool. Today, we want to explore this tool mediation and experiment how Linux helps us here.

Sending File Descriptors

On the very first day of this Advent, we have taken a look at struct file, the file descriptor table, and discussed that in modern times, we have to think of file descriptors more like "capabilities", rather than only an open files. In the previous days, we have encountered this notion of "more than a file" with different tasks (e.g., inotify, epoll, or iouring). There, we have come to the point that we have to see a file descriptor like a pointer to some object in the kernel space. We say 8, but we mean the created io_uring object within the kernel space.

Until now, the creator of these objects and their users were always the same process. We created the object, and also we use it. For our ELF scenario, the tool the ELF picked up from the shelf is the file descriptor that we want to hand to the little ELF, which is, for example, our child process. We want to pass a file descriptor to a child process!

Our very first, naïve idea to just transmit the file descriptor number, of course does not work. The child process has a different file descriptor table (see clone(2)), so it would get a different struct file when indexing into its own FD table with our index. After all, this way we only transmit an integer. Therefore, there has to be an explicit mechanism.

UNIX Domain Sockets

In modern Linux there is the possibility to "steal"/grab a file descriptor from a process with pidfd_getfd(2). The only problem is, that this method requires the receiver to have elevated rights (PTRACE_MODE_ATTACH_REALCREDS) over the "sender" process, which does not fit our intended use case. Therefore, we will explore the traditional UNIX way with UNIX domain sockets today, which we already have encountered in the Postbox exercise.

With UNIX domain sockets two processes can connect to each other and establish a bi-directional channel between them. Just like a TCP connection, but only between local processes! The OS is usually not interested in the data that the sender transmits to the receiver, the socket is just a byte-oriented connection between two processes. For our scenario the OS has to do something as we want to manipulate the file descriptor table of the receiver and insert a new file descriptor.

As sending the FD in-band with data would break with the byte-stream abstraction, UNIX goes a different route and invented the idea of out-of-band control messages that are attached to the in-band data. When we think of the data we send like of a letter - then the control message is a short notice of auxiliary data that we write onto the envelope.

More concretely, you will implement these functions:

void sendfd(int sockfd, void *buf, size_t buflen, int fd)
int recvfd(int sock_fd, char *buf, size_t buflen, int *fd)

On the sender side sendfd sends a message (buf, buflen) into the connected UNIX socket sockfd and attaches the file descriptor to the message as an SCM_RIGHTS control message. On the receiver side, recvfd reads the in-band message into buf and extracts the file descriptor from the auxiliary data and writes it to *fd.

On the server side, the strace of this communication will look like this:

socket(AF_UNIX, SOCK_SEQPACKET, 0)      = 3
bind(3, {sa_family=AF_UNIX, sun_path="./socket"}, 110) = 0
listen(3, 10)                           = 0
accept(3, NULL, NULL)                   = 4
sendmsg(4, {msg_name=NULL, msg_namelen=0, 
            msg_iov=[{iov_base="STDOUT", iov_len=6}], 
            msg_controllen=24, msg_flags=0}, 0) = 6
close(4)                                = 0

In the exchange, the server sends one file descriptor ([1]) to the receiver with SCM_RIGHTS. For more details, please look at unix(7). You may also want to read the man page for cmsg(3) carefully!

How is this implemented?

When the sender initiates the transfer (__scm_send()), the kernel resolves the file descriptor number to a struct file* with fget_raw() in scm_fp_copy and saves it in an scm_fp_list, which is then attached to the message. The kernel has to do this FD resolving at just this moment while sending, because the mapping between file descriptor number and struct file could change before the message is received, which would be dangerous and could leak data the sender wanted to keep private.

Then the scm_fp_list is attached to the message and waits until the receiver asks for the message. In that moment, scm_detach_fds is called and the received installs the received descriptors in its own FD table with receive_fd_user.

Security Considerations

Like giving a tool to a child, handing a file descriptor to someone is a dangerous thing. As UNIX performs access checks only on the creation of a descriptor, sending a descriptor from a privileged process to an unprivileged process is dangerous! The privileged process thereby grants access to a certain resource to a less privileged actor, which might do harm with it.

Nevertheless, we could make this a central part of our security architecture: Imagine a system, where all files except one UNIX domain socket are owned by root and we install a file descriptor access daemon to hand out descriptors to unprivileged users if they pass a certain check. We could enforce our very own file-access policy besides acl(5) and the standard UNIX file mode (see inode(7)) with this technique.

However, you then should also consider that there are system calls that perform operations upon file descriptors that you might find surprising. For example, with fchdir(2) and an open directory handle (see Last Christmas I Gave you my Letter), a process can change its current working directory without specifying a path, whereby processes can break out of chroot(2) containers.