Automatic Derivation of Instruction-Local Fault-Propagation Behavior

Typ der Arbeit: Masterarbeit
Status der Arbeit: offen
Betreuer: Christian Dietrich

This is an explorative topic and probably a quite challenging thesis.

Testing fault tolerance mechanisms is commonly done by performing extensive fault-injection experiments on a system that try to mimic physical causes of radiation effects like soft errors/bit flips and then observing the system's behaviour. There are different strategies to perform such injections: In the most basic variant, one would inject every bit in every cycle, which spans a so-called fault space. However, this strategy is not an viable option for longer running programs that operate on large data sets.

Therefore, fault-space pruning tecniques where developed that subsume multiple faults of that basic injection model into an equivalence set, where an injection into any member of that set will yield the exact same result. Therefore, we only have to inject one fault of each set, the fault-injection pilot, and backproject the result onto all members of the set. One basic method to form such equivalence sets is the so-called Def/Use Pruning method, which subsumes possible faults of the same memory location between an read/write events. In a nutshell, as long as a faulty bit is stored passively in memory, it cannot have any influence on the system.

In previous work, we developed an extension to the classic def/use-pruning technique: data-flow sensitive fault pruning. This technique does not only use the read/write semantic of executed assembler instructions but also their semantic. For example, if you flip a bit in one argument of an XOR-instruction this is exactly equivalent to flip the same bit position in the output of the instruction. Other instruction-specific fault propagations that are easy to understand include MOV, AND, OR instructions.

Besides the general pruning algorithm, this technique requires instruction-local fault-space propagation rules, which describe how a one-bit fault at on of its inputs propagates to the output bits.These rules are specific for each architecture, which calls for an automatic derivation! With this thesis, it is your task to develop a method for deriving such propagation rules based on a formal description of the instruction-set architecture.

Tasks🔗

Review literature on fault-equivalence collapsing in logical circuits
Extract model (e.g., SAT or SMT) of individual instructions from the formal ISA model
Adapt and apply fault-collapsing technique from the hardware domain

Automatic Derivation of Instruction-Local Fault-Propagation Behavior

Tasks🔗

Further Reading🔗