Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks
Abstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces t...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-06-01
|
Series: | The Journal of Engineering |
Online Access: | https://doi.org/10.1049/tje2.12037 |
id |
doaj-0411d56afe854f269b48e7506f73a262 |
---|---|
record_format |
Article |
spelling |
doaj-0411d56afe854f269b48e7506f73a2622021-06-23T07:49:33ZengWileyThe Journal of Engineering2051-33052021-06-012021631232010.1049/tje2.12037Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networksLabson Koloko0Takahiro Matsumoto1Hitoshi Obara2Graduate School of Engineering Science Akita University Tegata Gakuen Akita JapanGraduate School of Engineering Science Akita University Tegata Gakuen Akita JapanGraduate School of Engineering Science Akita University Tegata Gakuen Akita JapanAbstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces the hardware complexity of PEs from O(N2)to O(N(log2N)2) due to a distributed architecture. In the proposed design, asynchronous operation was introduced in parts to reduce the time complexity per PE stage down to O(1) within a certain N, while it takes O(log2N) time per PE stage in conventional algorithms. A prototype parallel was constructed and PEs were distributed in a field programmable gate array to investigate performance for the switch size of N = 4 to 32. The experimental results demonstrate that the proposed design outperforms a recent method by at least several times in terms of hardware and processing time complexities.https://doi.org/10.1049/tje2.12037 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Labson Koloko Takahiro Matsumoto Hitoshi Obara |
spellingShingle |
Labson Koloko Takahiro Matsumoto Hitoshi Obara Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks The Journal of Engineering |
author_facet |
Labson Koloko Takahiro Matsumoto Hitoshi Obara |
author_sort |
Labson Koloko |
title |
Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks |
title_short |
Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks |
title_full |
Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks |
title_fullStr |
Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks |
title_full_unstemmed |
Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks |
title_sort |
design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in beneš networks |
publisher |
Wiley |
series |
The Journal of Engineering |
issn |
2051-3305 |
publishDate |
2021-06-01 |
description |
Abstract A new design for parallel and distributed processing elements (PEs) is proposed to configure Beneš networks based on a novel parallel algorithm that can realise full and partial permutations in a unified manner with very little overhead time and extra hardware. The proposed design reduces the hardware complexity of PEs from O(N2)to O(N(log2N)2) due to a distributed architecture. In the proposed design, asynchronous operation was introduced in parts to reduce the time complexity per PE stage down to O(1) within a certain N, while it takes O(log2N) time per PE stage in conventional algorithms. A prototype parallel was constructed and PEs were distributed in a field programmable gate array to investigate performance for the switch size of N = 4 to 32. The experimental results demonstrate that the proposed design outperforms a recent method by at least several times in terms of hardware and processing time complexities. |
url |
https://doi.org/10.1049/tje2.12037 |
work_keys_str_mv |
AT labsonkoloko designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks AT takahiromatsumoto designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks AT hitoshiobara designandimplementationoffastandhardwareefficientparallelprocessingelementstosetfullandpartialpermutationsinbenesnetworks |
_version_ |
1721362340319854592 |