For Christmas, Checks are Extended

Git-Repository: Template Solution Solution-Diff (Solution is posted at 18:00 CET)
Workload: 65 lines of code
Important System-Calls: getxattr(2), setxattr(2), listxattr(2)
Recommended Reads:

Oh noes! Yesterday, we build a utility to sort the wishlists of all children on the entire planet. However, it would be a shame if the hardware, where we store that wishlist pile, had a disk error and little Helen received a set of "bleeding crooks" instead of the desired "building bricks" due to some bit flips on the disk. That would be quite disappointing and not at all joyful and happy. Therefore, the ELF team decided to add some checksums to the sorting office.

Extended File Attributes

With a checksum over the content of a file, we calculate meta data that is closely associated with the file. While we could store the wishlist and the checksum in two different files (e.g., helen and helen.sha256), it would be great if we could associate data and meta data more closely. For example, if we use the 2-file solution and rename one or more a wishlists, we always have to remember to also rename the checksum file. Not even speaking of those poor children whose names end with .sha256... looking at you little bobby table.

And with modern Linuxes, we can solve this problem with extended file attributes (xattr(7)). With extended attributes, we can attach key-value pairs to a file and the OS keeps track of those attributes and moves files and attributes synchronously. Unlike Windows' NTFS file streams, Linux imposes some restrictions for the size of the attribute list. But for our checksum that we want to attach to the wishlist, we are just fine.

If you want to play around with extended attributes, you should install the xattr(1) tool:

$ touch my-file
$ xattr -w user.checksum 1235 my-file
$ xattr -l my-file
user.checksum: 1235

Interestingly, extended attributes are a quite basic mechanism in Linux and are the basis for other well-known mechanisms. For example, on ext4, the access control lists (acl(5) are stored as serialized objects within extended attributes:

$ setfacl -m "u:stettberger:rwx" my-file
$ xattr -l my-file
user.checksum: 1235
0000   02 00 00 00 01 00 06 00 FF FF FF FF 02 00 07 00    ................
0010   78 27 00 00 04 00 04 00 FF FF FF FF 10 00 07 00    x'..............
0020   FF FF FF FF 20 00 04 00 FF FF FF FF                .... .......

With these ACLs, users can control access to a file in a much more fine-grained way. For this to work, the user sets the serialized ACLs as an attribute, which the kernel interprets when accessing a file. To distinguish between different uses of extended attributes, the attribute key is a stringly-typed name with different prefixes. To obey this, we have to prefix our checksum attribute with user..


Write a small utility program that takes a file name, maps its contents into its address space with mmap(2), and uses calc_checksum to derive an checksum over the file contents. If the given file has no user.checksum attribute yet, just set the checksum, otherwise, compare the old checksum with the newly calculated one and issue an error message if they mismatch.