IIW-Project: Content-Based Caching of Build Artifacts and Test Results

E-Mail: christian.dietrich@tuhh.de
This IIW Project worked on by 8 students from the Master's course Informatik-Ingenieurwesen.


TL;DR: Build an object-file and test-result cache as a network service and integrate it into the build system of an existing software project. Thereby, incremental compilation and testing becomes lightning fast.

With each (git) commit that a developer pushes to a continous-integration system, the whole project is rebuild in a clean environment and all test cases are scheduled for re-execution to detect introduced bugs early on and blame the guilty developer. However, not every commit influences every compilation task and can have an influence on different test cases. Therefore, techniques like incremental compilation and regression-test selection were already widely explored by different researches.

However, most of these approaches work change-based as they compare two versions of the same program and derive the necessary steps that have to be executed from the detected difference. For example, the ~make~ tool re-executes a step only if at least one input dependency has a newer timestamp than the output artifact. Thereby, ~make~ does not look at the content of the input, but only on the Boolean change information.

With the content-based approach, the build system summarizes the input files with a hash function and uses a build-artifact cache to avoid the costly recompilation. An example for such a content-based tool is ccache.

In this project, the students should restructure the build and test system of an existing software project (e.g., OpenSSL) from the change-based to content-based paradigm. In this process, they should design a centralized build-artifact and test-result cache as a network service (e.g., on the base of memcached) and integrate it with the build system. Ideally, in the end, a purely cosmetic change to a central source-file should not lead to a re-execution of the compiler, the linker, or any test case.

Furthermore, due to the content-based nature and the desired distributed design of the result cache, build and test results can be reused between developers and across different development branches. The students should develop adequate benchmark scenarios and quantify the end-to-end savings of their build-system adaptions.

For this project, the supervisor will supply the project group with existing content-based compilation-avoidance and regression-test selection methods, as well with some existing automated evaluation scripts.