Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs

Nowadays, embedded systems are utilized in many areas and become omnipresent, making people's lives more comfortable. Embedded systems have to handle more and more functionality in many products. To maintain the often required low energy consumption, multi-core systems provide high performance...

Full description

Bibliographic Details
Main Author: Heid, Kris
Format: Others
Language:en
Published: 2019
Online Access:https://tuprints.ulb.tu-darmstadt.de/9020/1/2019_08_23_Heid_Kris.pdf
Heid, Kris <http://tuprints.ulb.tu-darmstadt.de/view/person/Heid=3AKris=3A=3A.html> : Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs. Technische Universität, Darmstadt [Ph.D. Thesis], (2019)
id ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-9020
record_format oai_dc
collection NDLTD
language en
format Others
sources NDLTD
description Nowadays, embedded systems are utilized in many areas and become omnipresent, making people's lives more comfortable. Embedded systems have to handle more and more functionality in many products. To maintain the often required low energy consumption, multi-core systems provide high performance at moderate energy consumption. The development started with dual-core processors and has today reached many-core designs with dozens and hundreds of processor cores. However, existing applications can barely leverage the potential of that many cores. Legacy applications are usually written sequentially and thus typically use only one processor core. Thus, these applications do not benefit from the advantages provided by modern many-core systems. Rewriting those applications to use multiple cores requires new skills from developers and it is also time-consuming and highly error prone. Dozens of languages, APIs and compilers have already been presented in the past decades to aid the user with parallelizing applications. Fully automatic parallelizing compilers are seen as the holy grail, since the user effort is kept minimal. However, automatic parallelizers often cannot extract parallelism as good as user aided approaches. Most of these parallelization tools are designed for desktop and high-performance systems and are thus not tuned or applicable for low performance embedded systems. To improve this situation, this work presents an automatic parallelizer for embedded systems, which is able to mostly deliver better quality than user aided approaches and if not allows easy manual fine-tuning. Parallelization tools extract concurrently executable tasks from an application. These tasks can then be executed on different processor cores. Parallelization tools and automatic parallelizers in particular often struggle to efficiently map the extracted parallelism to an existing multi-core processor. This work uses soft-core processors on FPGAs, which makes it possible to realize custom multi-core designs in hardware, within a few minutes. This allows to adapt the multi-core processor to the characteristics of the extracted parallelism. Especially, core-interconnects for communication can be optimized to fit the communication pattern of the parallel application. Embedded applications are often structured as follows: receive input data, (multiple) data processing steps, data output. The multiple processing steps are often realized as consecutive loosely coupled transformations. These steps naturally already model the structure of a processing pipeline. It is the goal of this work to extract this kind of pipeline-parallelism from an application and map it to multiple cores to increase the overall throughput of the system. Multiple cores forming a chain with direct communication channels ideally fit this pattern. The previously described, so called pipeline-parallelism is a barely addressed concept in most parallelization tools. Also, current multi-core designs often do not support the hardware flexibility provided by soft-cores, targeted in this approach. The main contribution of this work is an automatic parallelizer which is able to map different processing steps from the source-code of a sequential application to different cores in a multi-core pipeline. Users only specify the required processing speed after parallelization. The developed tool tries to extract a matching parallelized software design along with a custom multi-core design out of sequential embedded legacy applications. The automatically created multi-core system already contains used peripherals extracted from the source-code and is ready to be used. The presented parallelizer implements multi-objective optimization to generate a minimal hardware design, just fulfilling the user defined requirement. To the best of my knowledge, the possibility to generate such a multi-core pipeline defined by the demands of the parallelized software has never been presented before. The approach is implemented for two soft-core processors and evaluation shows for both targets high speedups of 12x and higher at a reasonable hardware overhead. Compared to other automatic parallelizers, which mainly focus on speedups through latency reduction, significantly higher speedups can be achieved depending on the given application structure.
author Heid, Kris
spellingShingle Heid, Kris
Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs
author_facet Heid, Kris
author_sort Heid, Kris
title Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs
title_short Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs
title_full Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs
title_fullStr Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs
title_full_unstemmed Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs
title_sort automatically parallelizing embedded legacy software on soft-core socs
publishDate 2019
url https://tuprints.ulb.tu-darmstadt.de/9020/1/2019_08_23_Heid_Kris.pdf
Heid, Kris <http://tuprints.ulb.tu-darmstadt.de/view/person/Heid=3AKris=3A=3A.html> : Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs. Technische Universität, Darmstadt [Ph.D. Thesis], (2019)
work_keys_str_mv AT heidkris automaticallyparallelizingembeddedlegacysoftwareonsoftcoresocs
_version_ 1719243253733130240
spelling ndltd-tu-darmstadt.de-oai-tuprints.ulb.tu-darmstadt.de-90202019-09-04T03:31:53Z http://tuprints.ulb.tu-darmstadt.de/9020/ Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs Heid, Kris Nowadays, embedded systems are utilized in many areas and become omnipresent, making people's lives more comfortable. Embedded systems have to handle more and more functionality in many products. To maintain the often required low energy consumption, multi-core systems provide high performance at moderate energy consumption. The development started with dual-core processors and has today reached many-core designs with dozens and hundreds of processor cores. However, existing applications can barely leverage the potential of that many cores. Legacy applications are usually written sequentially and thus typically use only one processor core. Thus, these applications do not benefit from the advantages provided by modern many-core systems. Rewriting those applications to use multiple cores requires new skills from developers and it is also time-consuming and highly error prone. Dozens of languages, APIs and compilers have already been presented in the past decades to aid the user with parallelizing applications. Fully automatic parallelizing compilers are seen as the holy grail, since the user effort is kept minimal. However, automatic parallelizers often cannot extract parallelism as good as user aided approaches. Most of these parallelization tools are designed for desktop and high-performance systems and are thus not tuned or applicable for low performance embedded systems. To improve this situation, this work presents an automatic parallelizer for embedded systems, which is able to mostly deliver better quality than user aided approaches and if not allows easy manual fine-tuning. Parallelization tools extract concurrently executable tasks from an application. These tasks can then be executed on different processor cores. Parallelization tools and automatic parallelizers in particular often struggle to efficiently map the extracted parallelism to an existing multi-core processor. This work uses soft-core processors on FPGAs, which makes it possible to realize custom multi-core designs in hardware, within a few minutes. This allows to adapt the multi-core processor to the characteristics of the extracted parallelism. Especially, core-interconnects for communication can be optimized to fit the communication pattern of the parallel application. Embedded applications are often structured as follows: receive input data, (multiple) data processing steps, data output. The multiple processing steps are often realized as consecutive loosely coupled transformations. These steps naturally already model the structure of a processing pipeline. It is the goal of this work to extract this kind of pipeline-parallelism from an application and map it to multiple cores to increase the overall throughput of the system. Multiple cores forming a chain with direct communication channels ideally fit this pattern. The previously described, so called pipeline-parallelism is a barely addressed concept in most parallelization tools. Also, current multi-core designs often do not support the hardware flexibility provided by soft-cores, targeted in this approach. The main contribution of this work is an automatic parallelizer which is able to map different processing steps from the source-code of a sequential application to different cores in a multi-core pipeline. Users only specify the required processing speed after parallelization. The developed tool tries to extract a matching parallelized software design along with a custom multi-core design out of sequential embedded legacy applications. The automatically created multi-core system already contains used peripherals extracted from the source-code and is ready to be used. The presented parallelizer implements multi-objective optimization to generate a minimal hardware design, just fulfilling the user defined requirement. To the best of my knowledge, the possibility to generate such a multi-core pipeline defined by the demands of the parallelized software has never been presented before. The approach is implemented for two soft-core processors and evaluation shows for both targets high speedups of 12x and higher at a reasonable hardware overhead. Compared to other automatic parallelizers, which mainly focus on speedups through latency reduction, significantly higher speedups can be achieved depending on the given application structure. 2019 Ph.D. Thesis NonPeerReviewed text CC-BY-NC-ND 4.0 International - Creative Commons, Attribution Non-commerical, No-derivatives https://tuprints.ulb.tu-darmstadt.de/9020/1/2019_08_23_Heid_Kris.pdf Heid, Kris <http://tuprints.ulb.tu-darmstadt.de/view/person/Heid=3AKris=3A=3A.html> : Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs. Technische Universität, Darmstadt [Ph.D. Thesis], (2019) en info:eu-repo/semantics/doctoralThesis info:eu-repo/semantics/openAccess