Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks

Abstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces t...

Full description

Bibliographic Details
Main Authors: Labson Koloko, Takahiro Matsumoto, Hitoshi Obara
Format: Article
Language:English
Published: Wiley 2021-06-01
Series:The Journal of Engineering
Online Access:https://doi.org/10.1049/tje2.12037
id doaj-0411d56afe854f269b48e7506f73a262
record_format Article
spelling doaj-0411d56afe854f269b48e7506f73a2622021-06-23T07:49:33ZengWileyThe Journal of Engineering2051-33052021-06-012021631232010.1049/tje2.12037Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networksLabson Koloko0Takahiro Matsumoto1Hitoshi Obara2Graduate School of Engineering Science Akita University Tegata Gakuen Akita JapanGraduate School of Engineering Science Akita University Tegata Gakuen Akita JapanGraduate School of Engineering Science Akita University Tegata Gakuen Akita JapanAbstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces the hardware complexity of PEs from O(N2)to O(N(log2N)2) due to a distributed architecture. In the proposed design, asynchronous operation was introduced in parts to reduce the time complexity per PE stage down to O(1) within a certain N, while it takes O(log2N) time per PE stage in conventional algorithms. A prototype parallel was constructed and PEs were distributed in a field programmable gate array to investigate performance for the switch size of N = 4 to 32. The experimental results demonstrate that the proposed design outperforms a recent method by at least several times in terms of hardware and processing time complexities.https://doi.org/10.1049/tje2.12037
collection DOAJ
language English
format Article
sources DOAJ
author Labson Koloko
Takahiro Matsumoto
Hitoshi Obara
spellingShingle Labson Koloko
Takahiro Matsumoto
Hitoshi Obara
Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
The Journal of Engineering
author_facet Labson Koloko
Takahiro Matsumoto
Hitoshi Obara
author_sort Labson Koloko
title Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_short Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_full Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_fullStr Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_full_unstemmed Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_sort design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in beneš networks
publisher Wiley
series The Journal of Engineering
issn 2051-3305
publishDate 2021-06-01
description Abstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces the hardware complexity of PEs from O(N2)to O(N(log2N)2) due to a distributed architecture. In the proposed design, asynchronous operation was introduced in parts to reduce the time complexity per PE stage down to O(1) within a certain N, while it takes O(log2N) time per PE stage in conventional algorithms. A prototype parallel was constructed and PEs were distributed in a field programmable gate array to investigate performance for the switch size of N = 4 to 32. The experimental results demonstrate that the proposed design outperforms a recent method by at least several times in terms of hardware and processing time complexities.
url https://doi.org/10.1049/tje2.12037
work_keys_str_mv AT labsonkoloko designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks
AT takahiromatsumoto designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks
AT hitoshiobara designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks
_version_ 1721362340319854592