Distributed real-time fault tolerance in a virtualized separation kernel

Computers are increasingly being placed in scenarios where a computer error could result in the loss of human life or significant financial loss. Fault tolerant techniques must be employed to prevent an error from resulting in a fault causing such losses. Two types of errors that are common in...

Full description

Bibliographic Details
Main Author: Missimer, Eric
Language:en_US
Published: 2018
Subjects:
Online Access:https://hdl.handle.net/2144/27371
Description
Summary:Computers are increasingly being placed in scenarios where a computer error could result in the loss of human life or significant financial loss. Fault tolerant techniques must be employed to prevent an error from resulting in a fault causing such losses. Two types of errors that are common in real-time and embedded system are soft errors, i.e. data bit corruption, and timing errors, such as missed deadlines. Purely software based techniques to address these types of errors have the advantage of not requiring specialized hardware and are able to use more readily available commercial off-the-shelf hardware. Timing errors are addressed using Adaptive Mixed-Criticality, a scheduling technique where higher criticality tasks are given precedence over those of lower criticality when it is impossible to guarantee the schedulability of all tasks. While mixed-criticality scheduling has gained attention in recent years, most approaches assume a periodic task model and that the system has a single criticality level which dictates the available budget to all tasks. In practice these assumptions do not hold: different types of tasks are better served by different scheduling approaches and only a subset of high critical tasks might require additional capacity to meet deadlines. In the latter case, this occurs when a process has experienced a fault and requires additional capacity to perform the recovery. In this thesis, soft errors are addressed using a novel real-time fault tolerance method based on a virtualized separation kernel. Instead of executing redundant copies of an application on separate machines, the applications are consolidated onto one multi-core processor and use hardware virtualization extensions to partition the applications. This allows new recovery schemes to be explored. In addition, the maximum recovery time is sufficiently bounded to ensure recovery occurs in a timely manner without affecting the normal execution of the application. A virtualized separation kernel in combination with Adaptive Mixed-Criticality techniques creates a fault tolerant system that predictably detects and recovers from timing and soft errors.