NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors

Multi-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based o...

Full description

Bibliographic Details
Main Authors: Vanessa Vargas, Pablo Ramos, Jean-Francois Méhaut, Raoul Velazco
Format: Article
Language:English
Published: MDPI AG 2018-03-01
Series:Applied Sciences
Subjects:
Online Access:http://www.mdpi.com/2076-3417/8/3/465
id doaj-72b03c12abae416a941b2bb53281605f
record_format Article
spelling doaj-72b03c12abae416a941b2bb53281605f2020-11-25T00:59:56ZengMDPI AGApplied Sciences2076-34172018-03-018346510.3390/app8030465app8030465NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core ProcessorsVanessa Vargas0Pablo Ramos1Jean-Francois Méhaut2Raoul Velazco3Universidad de las Fuerzas Armadas ESPE, DEEE, Avenida General Rumiñahui S/N, 171-5-231B, Sangolqui, EcuadorUniversidad de las Fuerzas Armadas ESPE, DEEE, Avenida General Rumiñahui S/N, 171-5-231B, Sangolqui, EcuadorLIG Labs., Université Grenoble-Alpes, 3 Parvis Louis Néel, 38054. Grenoble, FranceTIMA Labs., Université Grenoble-Alpes, Avenue Félix Viallet, 38000 Grenoble, FranceMulti-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based on redundancy and partitioning principles called N-Modular Redundancy and M-Partitions (NMR-MPar). By combining both principles, this approach allows multi-/many-core processors to perform critical functions in mixed-criticality systems. Benefiting from the capabilities of these devices, NMR-MPar creates different partitions that perform independent functions. For critical functions, it is proposed that N partitions with the same configuration participate of an N-modular redundancy system. In order to validate the approach, a case study is implemented on the KALRAY Multi-Purpose Processing Array (MPPA)-256 many-core processor running two parallel benchmark applications. The traveling salesman problem and matrix multiplication applications were selected to test different device’s resources. The effectiveness of NMR-MPar is assessed by software-implemented fault-injection. For evaluation purposes, it is considered that the system is intended to be used in avionics. Results show the improvement of the application reliability by two orders of magnitude when implementing NMR-MPar on the system. Finally, this work opens the possibility to use massive parallelism for dependable applications in embedded systems.http://www.mdpi.com/2076-3417/8/3/465fault tolerancemany-coremulti-corepartitioningredundancyreliabilityfault injection
collection DOAJ
language English
format Article
sources DOAJ
author Vanessa Vargas
Pablo Ramos
Jean-Francois Méhaut
Raoul Velazco
spellingShingle Vanessa Vargas
Pablo Ramos
Jean-Francois Méhaut
Raoul Velazco
NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors
Applied Sciences
fault tolerance
many-core
multi-core
partitioning
redundancy
reliability
fault injection
author_facet Vanessa Vargas
Pablo Ramos
Jean-Francois Méhaut
Raoul Velazco
author_sort Vanessa Vargas
title NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors
title_short NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors
title_full NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors
title_fullStr NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors
title_full_unstemmed NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors
title_sort nmr-mpar: a fault-tolerance approach for multi-core and many-core processors
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2018-03-01
description Multi-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based on redundancy and partitioning principles called N-Modular Redundancy and M-Partitions (NMR-MPar). By combining both principles, this approach allows multi-/many-core processors to perform critical functions in mixed-criticality systems. Benefiting from the capabilities of these devices, NMR-MPar creates different partitions that perform independent functions. For critical functions, it is proposed that N partitions with the same configuration participate of an N-modular redundancy system. In order to validate the approach, a case study is implemented on the KALRAY Multi-Purpose Processing Array (MPPA)-256 many-core processor running two parallel benchmark applications. The traveling salesman problem and matrix multiplication applications were selected to test different device’s resources. The effectiveness of NMR-MPar is assessed by software-implemented fault-injection. For evaluation purposes, it is considered that the system is intended to be used in avionics. Results show the improvement of the application reliability by two orders of magnitude when implementing NMR-MPar on the system. Finally, this work opens the possibility to use massive parallelism for dependable applications in embedded systems.
topic fault tolerance
many-core
multi-core
partitioning
redundancy
reliability
fault injection
url http://www.mdpi.com/2076-3417/8/3/465
work_keys_str_mv AT vanessavargas nmrmparafaulttoleranceapproachformulticoreandmanycoreprocessors
AT pabloramos nmrmparafaulttoleranceapproachformulticoreandmanycoreprocessors
AT jeanfrancoismehaut nmrmparafaulttoleranceapproachformulticoreandmanycoreprocessors
AT raoulvelazco nmrmparafaulttoleranceapproachformulticoreandmanycoreprocessors
_version_ 1725215263358451712