Summary: | Computers are increasingly being placed in scenarios where a computer error
could result in the loss of human life or significant financial loss. Fault
tolerant techniques must be employed to prevent an error from resulting in a
fault causing such losses. Two types of errors that are common in real-time and
embedded system are soft errors, i.e. data bit corruption, and timing errors,
such as missed deadlines. Purely software based techniques to address these
types of errors have the advantage of not requiring specialized hardware and are
able to use more readily available commercial off-the-shelf hardware. Timing
errors are addressed using Adaptive Mixed-Criticality, a scheduling technique
where higher criticality tasks are given precedence over those of lower
criticality when it is impossible to guarantee the schedulability of all tasks.
While mixed-criticality scheduling has gained attention in recent years, most
approaches assume a periodic task model and that the system has a single
criticality level which dictates the available budget to all tasks. In practice
these assumptions do not hold: different types of tasks are better served by
different scheduling approaches and only a subset of high critical tasks might
require additional capacity to meet deadlines. In the latter case, this occurs
when a process has experienced a fault and requires additional capacity to
perform the recovery.
In this thesis, soft errors are addressed using a novel real-time fault
tolerance method based on a virtualized separation kernel. Instead of executing
redundant copies of an application on separate machines, the applications are
consolidated onto one multi-core processor and use hardware virtualization
extensions to partition the applications. This allows new recovery schemes to
be explored. In addition, the maximum recovery time is sufficiently bounded to
ensure recovery occurs in a timely manner without affecting the normal execution
of the application. A virtualized separation kernel in combination with
Adaptive Mixed-Criticality techniques creates a fault tolerant system that
predictably detects and recovers from timing and soft errors.
|