On Christmas, They Come and Go!
Workload: 115 lines of code
Important System-Calls: netlink(7), netlink(3)
Christmas is just around the corner and it is meanwhile gotten quite busy in the Christmas village, it's become a big mess, everything is full of ELFs. With all the coming and going, it is really easy to lose the overview over the ELFs. Since the Council of ELFs is rather on the conservative side and would therefore like to keep track of their members, this year a registration office was established, where all ELFs that come and go have to register. This booth directly reports to the Council.
Today, we will look at a part of the kernel interface that is very important but also rather opaque and weird: netlink(7). In its essence, netlink is a general purpose communication layer to transfer data between the kernel and the user space. So, unlike yesterday's UNIX domain sockets, where both ends where a process, netlinks connects a process to a kernel subsystem.
Today, the most common usage for netlink is the configuration of the networking subsystem in Linux. Every time you add a new route, add a firewall rule, or change your IP address, a user process connects via netlink to the kernel and sends a message to change the routing information, append a firewall rule, or modify a device's IP address. You also can see netlink in action, if you change the network configuration without having the necessary permissions:
$ id uid=1000(user) gid=1000(user) .... $ ip a add 22.214.171.124/23 dev lo :( RTNETLINK answers: Operation not permitted
But netlink is not only for sending configuration packets to the
kernel, we can also receive data from the kernel on a netlink
channel. For example, with
NETLINK_NETFILER we can passively monitor
the network state of our kernel. Oleg
has a wonderful blog entry, including an example, on this.
While netlink is a powerful interface, it is in its essence a networking protocol and it has even an RFC that describes it. Therefore, working with netlink really feels awkward if you consider that you try to talk to your local kernel. Especially, the messages that we send to the kernel are prepared as if we would send them to a machine on the other side of the world. In my opinion, this is more than weird. But it's the reality.
Netlink Connector and
Coming back to our original problem, we want to build that registration booth. More specific, we want to register with our Kernel to inform us about every fork(2), execve(2), and every exit(2) in the whole system. We want to have an overview about everything! And guess what, there is an interface for exactly this task! What a coincident!
Let's look at a special kind of netlink sockets: Netlink
This abstraction set out to make it easy to communicate between the
user space and a kernel module. So kernel modules can add a callback
with cn_add_callback to a certain
cb_id. At the moment, there are eleven
pre-defined/well-known callback slots or registered connector types.
So, since this connector was introduced in 2004, we cannot say that it
gained wide-spread adoption in the kernel.
Nevertheless, with cn_proc, there is a connector to monitor process-state changes in the whole system. However, as it is netlink, it is quite tricky to get this beast to work.
The central idea is to create a new
socket and bind it to
CN_IDX_PROC. On this socket, we will receive
our fork/exec/exit messages. However, we first have to enable these
message by sending a message to
cn_proc: The message is a
proc_cn_mcast_op, wrapped in a
struct cn_msg, wrapped in a
nlmsg that we write to the socket.
And really, if you cannot figure this out on your own, don't be
ashamed. The documentation for this interface is less than stellar and
I can also directly link to the solution on
StackOverflow as you
will search for it anyway. But perhaps, by crafting your own solution
and looking at the
cn_proc.c in the kernel, you will get some
understanding about what is happening there. One starting point is to
have a look at the call-site of
which generates the
Also, as an example, the following happens on my machine when I run
./cn_proc and type
sleep in another terminal:
$ ./cn_proc ..... fork(): /bin/zsh (1451658, 1451658) -> (1451730, 1451730) exec(): /bin/sleep (1451730, 1451730) exit(): (1451730, 1451730) -> rc=0
We clearly see that my shell (
zsh) forks itself in order to directly
/bin/sleep, which exits after some time. Please note
that those tuples in my output are the task-group id and the actual Thread ID
tid. As all involved processes are single threaded, the pid is equal to
Another curiosity that I encountered during the preparation for today
cn_proc is a stateful interface. If you enable this event
stream once with
PROC_CN_MCAST_LISTEN, it is enabled and messages
will be created, even if nobody listens to it. But, if you forget to
enable the LISTING the next time, the events will still come anyway. Also,
there is no safe way to disable this interface by sending
PROC_CN_MCAST_IGNORE as the subsystem will use
decrement the variable
proc_event_num_listeners, which then could
overflow! All in all, this interface is broken and I don't see
how somebody could fix it. But, hey, perhaps this makes it a good match for netlink ;-)