StuBS
A3: IPC, fork and COW

Goal of this exercise is to allow communication between the isolated processes via message passing. Furthermore, you should implement an efficient system call to generate new processes, similar to fork(2) on Unix systems.

Synchronous Inter Process Communication

Your isolated processes lack efficient primitives for Inter Process Communication (IPC). One such mechanism for synchronous notification is provided by send-receive-reply, which has a clear hierarchy and therefore avoids deadlock situations with two threads sending each other at the same time.

You have to implement the following system calls:

  • void send(int pid, const void *sbuffer, size_t ssize, void *rbuffer, size_t rsize);
    void send(uint8_t destination, uint8_t vector)
    Send an Inter-Processor Interrupt (IPI)
    Definition: lapic_ipi.cc:195
    sends the message in sbuffer with the length ssize synchronously to the process with the PID pid. This operation blocks until the recipient has received the message with recv(), processed and sent an answer using reply(). This response is stored in rbuffer, with the maximum buffer size rsize.
  • int recv(void *buffer, size_t size);
    blocks the current process until it receives a message. The message is stored in buffer up to a maximum length of size bytes. The return value contains the process identification (PID) of the sender.
  • void reply(int pid, const void *buffer, size_t size);
    sends a response message to the process identified by pid. This function should not block and only success if the target process has already performed a corresponding send() to this process and is waiting for its completion.

The buffers for send(), recv() und reply() should be aligned to page boundaries to ease your implementation. To achieve this for global variables, you can use the GCC-specific C attribute __attribute__ ((aligned (x))).

Process Management

You have to implement the following system calls (which are quite similar to their Unix equivalents):

  • int fork();
    Create a new process duplicating the calling process, including the process state: On activation, the new child process shall start at the same point its parent will resume. The only difference is the return value of the call – the parent will get the process ID (PID) of its child, while the child will receive pid of its parent.
  • int getpid();
    Retrieve the process ID of the calling process. The PID has to be a unique thread identifier in the whole system.

All page frames belonging to the calling process have to be copied in fork. The fork system call's stub in user space is allowed to contain additional code to save and restore the contents of the non-scratch registers. This way you have the ability to re-use your existing infrastructure for launching a user space thread.

Test Application

At this point, you should extensively test your implementation. The fork system call can be an important assistant for this purpose.

Nevertheless, we provide an additional test case for you:

char __attribute__((aligned(4096))) sbuf[8194];
char __attribute__((aligned(4096))) rbuf[8194];
void main() {
fork();
fork();
int ppid = getpid();
int other = fork();
if (ppid == other) {
/* child */
sbuf[0] = 3;
sbuf[8192] = getppid();
sbuf[8193] = 1;
send(other, sbuf, 4097, rbuf, 4097);
char msg[] = "REPLY: AA\n";
msg[7] += rbuf[0] + sbuf[8193];
msg[8] += 4 + ppid;
write(0, msg, 10);
} else {
/* parent */
int X = recv(rbuf, 8193);
rbuf[0] = rbuf[0] + rbuf[8192];
rbuf[8193] = 7;
reply(X, rbuf, 8193);
}
exit();
}
void exit()
Deinitialize this CPU core.
Definition: core.cc:53
int main()
Kernels main function.
Definition: main.cc:56

Due to the fork() calls, this application will split into four pairs of threads. In each pair, one thread (parent) will wait for a message while the other thread (child) sends the message. The parent thread then calculates a value from the message and sends the buffer back.

On correct implementation of the system calls above, the output of this test application should be similar to:

REPLY: DD
REPLY: GG
REPLY: EE
REPLY: FF

The lines can appear in any order. The two letters following REPLY: depend on your implementation of the process identification assignment. However, on a single line those two letters must be equal!

Copy-on-Write

Especially fork requires copying a lot of pages – although some of them will stay the same on both parent and child and therefore can be shared. This sharing not only saves memory but improves the performance as well.

Pages to be copied while using fork or send-receive-reply should be displayed (mapped) in the other address space unless a write access occurs – only then they will be physically copied.

To detect a write access, the pages concerned have to be mapped as read-only. Thus, the page fault handler (IRQ 14) is triggered on a write operation. To track the number of read-only mappings for a physical page, you will need a mechanism to count the number of references on COW pages. On a write operation, the reference gets decremented and the page needs to be copied (unless there is only this one reference left). Additionally, the page mapping of the process needs to be adjusted (don't forget to invalidate those pages in the translation lookaside buffer (TLB)).

Note
Pages originally marked as read-only should (obviously) not become writable after such a copy operation. However, reference counting might be required for such pages as well (for example, if the parent process exits, the text segment has to be available still until the child exits).

For send-recv or reply-send the contents of the buffers can only be copied using COW if the message size is at least equal to the page size (leftovers have to be copied physically) and the alignment of both buffers are identical. To fulfill the latter requirement, the buffers are best placed at page borders. This can be achieved using the C++ specifier alignas(x) (or the GCC/LLVM extension __attribute__((aligned (x)))), whereas the constant x defines the alignment in bytes.

Stress-Test

At this point, you should reuse the test-program from above to stress test your implementation. The program should produce the same output.

Also, demonstrate the advantages of COW regarding the memory usage by printing the available page frames:

You should output the number of available page frames in the kernel area and the user area. When starting with QEMU with 128 MiB of RAM, our solution prints the following memory layout:

module 0x0x118000-0x0x11f000 user/build/initrd.img
kernel 0x100000-0x116010
(available) 0x100000 0x7ffdfff
 \-> use! 0x100000-0xfffff 0 pages
 \-> use! 0x117000-0x117fff 1 pages
 \-> use! 0x121000-0x7ffdfff 32477 pages
(reserved) 0x7ffe000 0x7ffffff
(reserved) 0xfffc0000 0xffffffff

At Scheduler::schedule(), our system requires 49 page frames:

allocator: 49 pages used
allocator: 28669 free userpages; 111 MByte
allocator: 3760 free kernelpages; 14 MByte

At IdleThread::action(), our system requires 42 page frames:

The reduced page-frame usage is rooted in the fact that the user-space applications require 7 pages from the initial ramdisk for code and data.