Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks

Abstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces t...

Full description

Bibliographic Details
Main Authors:	Labson Koloko, Takahiro Matsumoto, Hitoshi Obara
Format:	Article
Language:	English
Published:	Wiley 2021-06-01
Series:	The Journal of Engineering
Online Access:	https://doi.org/10.1049/tje2.12037

id	doaj-0411d56afe854f269b48e7506f73a262
record_format	Article
spelling	doaj-0411d56afe854f269b48e7506f73a2622021-06-23T07:49:33ZengWileyThe Journal of Engineering2051-33052021-06-012021631232010.1049/tje2.12037Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networksLabson Koloko0Takahiro Matsumoto1Hitoshi Obara2Graduate School of Engineering Science Akita University Tegata Gakuen Akita JapanGraduate School of Engineering Science Akita University Tegata Gakuen Akita JapanGraduate School of Engineering Science Akita University Tegata Gakuen Akita JapanAbstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces the hardware complexity of PEs from O(N2)to O(N(log2N)2) due to a distributed architecture. In the proposed design, asynchronous operation was introduced in parts to reduce the time complexity per PE stage down to O(1) within a certain N, while it takes O(log2N) time per PE stage in conventional algorithms. A prototype parallel was constructed and PEs were distributed in a field programmable gate array to investigate performance for the switch size of N = 4 to 32. The experimental results demonstrate that the proposed design outperforms a recent method by at least several times in terms of hardware and processing time complexities.https://doi.org/10.1049/tje2.12037
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Labson Koloko Takahiro Matsumoto Hitoshi Obara
spellingShingle	Labson Koloko Takahiro Matsumoto Hitoshi Obara Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks The Journal of Engineering
author_facet	Labson Koloko Takahiro Matsumoto Hitoshi Obara
author_sort	Labson Koloko
title	Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_short	Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_full	Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_fullStr	Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_full_unstemmed	Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
title_sort	design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in beneš networks
publisher	Wiley
series	The Journal of Engineering
issn	2051-3305
publishDate	2021-06-01
description	Abstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces the hardware complexity of PEs from O(N2)to O(N(log2N)2) due to a distributed architecture. In the proposed design, asynchronous operation was introduced in parts to reduce the time complexity per PE stage down to O(1) within a certain N, while it takes O(log2N) time per PE stage in conventional algorithms. A prototype parallel was constructed and PEs were distributed in a field programmable gate array to investigate performance for the switch size of N = 4 to 32. The experimental results demonstrate that the proposed design outperforms a recent method by at least several times in terms of hardware and processing time complexities.
url	https://doi.org/10.1049/tje2.12037
work_keys_str_mv	AT labsonkoloko designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks AT takahiromatsumoto designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks AT hitoshiobara designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks
_version_	1721362340319854592

Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks

Similar Items