Send a Letter to Santa. Fast.
Workload: 44 lines of code
Important System-Calls: sendfile(2), memfd_create(2)
- Dec. 1: The cat on the tip of the iceberg file 45 lines [open(2), pread(2), close(2)]
- Dec. 10: Pipes Full of Gravy ipc 131 lines [epoll_create(2), epoll_ctl(2), epoll_wait(2), pipe(7), splice(2)]
Children have a huge artistic potential and often express themselves graphically when preparing their wishlist. The ELFs know that and have adjusted their digital letter processing pipeline for those colorful letters: The incoming letters are scanned, saved as JPEGs, and the original paper is used for firing the cookie ovens. The digital letter is then OCR'ed and a neural network identifies and normalizes the drawings into simplified vector graphics, which are fed together with the extracted wishlist text into a gift-classifier network that produces a vector of possible gifts for each child. In batches of a few thousand letters, the pipeline creates an ILP problem to optimize the happiness of all children under the limited time budget. After all ELFs cannot invoke magic, but they have but a million tiny hands to work to craft teddies, ponies, and toy tanks.
Although this pipeline already sounds quite sophisticated, the ELFs identified some bottlenecks that are related to copying the large scanned letters around. To investigate on this, it is your task to find the fastest way to copy a file around.
Yesterday, we have learned about splice(2), which we can use to splice data from or into a pipe without copying the data to the user space first.
But what happens, if we would invoke
splice with two real open files?
The man page's ERRORS section is quite disappointing about this use-case which would be useful for our task:
EINVAL Neither of the file descriptors refers to a pipe.
But, Linux has not only one system call to copy data between file descriptors, but multiple ones that can be used in different use cases:
- splice(2) -- move data if source or destination is a pipe
- tee(2) -- like
splicebut does not consume data at the source, but copy it.
- vmsplice(2) -- a specialized
writev()function that splices an scattered buffer into a pipe.
- sendfile(2) -- move data between arbitrary file descriptors.
- copy_file_range(2) -- Like
splice()but both files have to be files on the same file system.
While I do not understand why Linux requires 5(!)
different system calls to move/copy data between file descriptors,
sendfile() is the one with the least constraints in usage.
Therefore, we want to look at it more closely in today's task and measure how much faster we can do a file copy with
What is a file?
Is a file those "things" on the hard disk or SSD that continue to exist if I turn off the computer?
Or are files those things that I can invoke
What is the essence of a file; that essence that remains if we strip away everything that we do not consider essential.
Let's start then: (1) With a tmpfs(5), we can mount a file system whose data is located only in volatile RAM; so not every file is persistent. (2) On Unix, if you open a file and unlink(2) the name from the file system, the file can no longer be opened, but you can use the descriptor as long as you don't close it; so having a name is also not essential for a file. What is essential, is that we can read from a file, write to a file, and seek around in its open file descriptors. One could say that a file is a linear stream of bytes that is addressed at the system call interface with a file descriptor.
And sometimes this is exactly what you want: an anonymous, in-RAM file that you have a descriptor for. And for exactly this use case, Linux 3.17 learned the memfd_create(2) system call which can be used, without a mounted
tmpfs, to create a new anonymous file in memory. With that you could, for example, simulate a std::stringstream in C:
int fd = memfd_create("name-for-debugging", 0); FILE *f = fdopen(fd, "rw"); int bytes = fprintf(f, "Hello to %d ELFs!", elf_count); pread(fd, buf, bytes, 0) fclose(f);
Today's task is to write a benchmark program that compares the performance of
sendfile() against a regular
read()/write() loop. To make the results comparable, we use a
memfd file as target for the copy operation.
In the template code, we already give you a benchmarking harness that invokes
memfd_create for you. If you think this is too boring, you can just skip the template and start with an empty file for this exercise. Really, today is all about observing the effects of
As reference, I observed the following numbers on my machine:
$ ROUNDS=100 ./sendfile test.jpg .... [read/write] copied with 1456.69 MiB/s (in 0.01 s, 139 syscalls) [ sendfile] copied with 2213.98 MiB/s (in 0.00 s, 2 syscalls) [read/write] copied with 1889.88 MiB/s (in 0.00 s, 139 syscalls) [ sendfile] copied with 2013.52 MiB/s (in 0.00 s, 2 syscalls) [read/write] copied with 2095.71 MiB/s (in 0.00 s, 139 syscalls) [ sendfile] copied with 2278.85 MiB/s (in 0.00 s, 2 syscalls) [read/write] copied with 1489.59 MiB/s (in 0.01 s, 139 syscalls) [ sendfile] copied with 2102.52 MiB/s (in 0.00 s, 2 syscalls) [read/write] copied with 1927.53 MiB/s (in 0.00 s, 139 syscalls) [ sendfile] copied with 2020.50 MiB/s (in 0.00 s, 2 syscalls) [read/write] copied with 1940.49 MiB/s (in 0.00 s, 139 syscalls) [ sendfile] copied with 2275.93 MiB/s (in 0.00 s, 2 syscalls) [read/write] copied with 1445.30 MiB/s (in 0.01 s, 139 syscalls) sendfile: 1705.70 MiB/s, read/write: 1560.81 MiB/s
sendfile()can return short. Therefore, you have to loop them until all bytes are written.
read/write, you should play a little bit with the buffer size. As a reference, GNU
cpuses an 128KiB buffer to copy data between file descriptors. What do you observe with smaller buffers? What do you observe with larger buffers?
- Count the number of system calls by just placing increment operations at neuralgic points (
read/write/sendfile) to get an idea about the complexity of the different strategies.