On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster

Many highly parallel algorithms usually generate large volumes of data containing both valid and invalid elements, and high-performance solutions to the stream compaction problem reveal extremely important in such scenarios. Although parallel stream compaction has been extensively studied in GPU-bas...

Full description

Bibliographic Details
Main Authors:	Gregorio Bernabé, Manuel E. Acacio
Format:	Article
Language:	English
Published:	Hindawi Limited 2018-01-01
Series:	Scientific Programming
Online Access:	http://dx.doi.org/10.1155/2018/2037272

id	doaj-cde4ef28f2804428b2c7d49550fc1536
record_format	Article
spelling	doaj-cde4ef28f2804428b2c7d49550fc15362021-07-02T02:57:17ZengHindawi LimitedScientific Programming1058-92441875-919X2018-01-01201810.1155/2018/20372722037272On the Parallelization of Stream Compaction on a Low-Cost SDC ClusterGregorio Bernabé0Manuel E. Acacio1Computer Engineering Department, University of Murcia, Murcia, SpainComputer Engineering Department, University of Murcia, Murcia, SpainMany highly parallel algorithms usually generate large volumes of data containing both valid and invalid elements, and high-performance solutions to the stream compaction problem reveal extremely important in such scenarios. Although parallel stream compaction has been extensively studied in GPU-based platforms, and more recently, in the Intel Xeon Phi platform, no study has considered yet its parallelization using a low-cost computing cluster, even when general-purpose single-board computing devices are gaining popularity among the scientific community due to their high performance per $ and watt. In this work, we consider the case of an extremely low-cost cluster composed by four Odroid C2 single-board computers (SDCs), showing that stream compaction can also benefit—important speedups can be obtained—from this kind of platforms. To do so, we derive two parallel implementations for the stream compaction problem using MPI. Then, we evaluate them considering varying number of processes and/or SDCs, as well as different input sizes. In general, we see that unless the number of elements in the stream is too small, the best results are obtained when eight MPI processes are distributed among the four SDCs that conform the cluster. To add value to the obtained results, we also consider the execution of the two parallel implementations for the stream compaction problem on a very high-performance but power-hungry 18-core Intel Xeon E5-2695 v4 multicore processor, obtaining that the Odroid C2 SDC cluster constitutes a much more efficient alternative when both resulting execution time and required energy are taken into account. Finally, we also implement and evaluate a parallel version of the stream split problem to store also the invalid elements after the valid ones. Our implementation shows good scalability on the Odroid C2 SDC cluster and more compensated computation/communication ratio when compared to the stream compaction problem.http://dx.doi.org/10.1155/2018/2037272
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Gregorio Bernabé Manuel E. Acacio
spellingShingle	Gregorio Bernabé Manuel E. Acacio On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster Scientific Programming
author_facet	Gregorio Bernabé Manuel E. Acacio
author_sort	Gregorio Bernabé
title	On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster
title_short	On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster
title_full	On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster
title_fullStr	On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster
title_full_unstemmed	On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster
title_sort	on the parallelization of stream compaction on a low-cost sdc cluster
publisher	Hindawi Limited
series	Scientific Programming
issn	1058-9244 1875-919X
publishDate	2018-01-01
description	Many highly parallel algorithms usually generate large volumes of data containing both valid and invalid elements, and high-performance solutions to the stream compaction problem reveal extremely important in such scenarios. Although parallel stream compaction has been extensively studied in GPU-based platforms, and more recently, in the Intel Xeon Phi platform, no study has considered yet its parallelization using a low-cost computing cluster, even when general-purpose single-board computing devices are gaining popularity among the scientific community due to their high performance per $ and watt. In this work, we consider the case of an extremely low-cost cluster composed by four Odroid C2 single-board computers (SDCs), showing that stream compaction can also benefit—important speedups can be obtained—from this kind of platforms. To do so, we derive two parallel implementations for the stream compaction problem using MPI. Then, we evaluate them considering varying number of processes and/or SDCs, as well as different input sizes. In general, we see that unless the number of elements in the stream is too small, the best results are obtained when eight MPI processes are distributed among the four SDCs that conform the cluster. To add value to the obtained results, we also consider the execution of the two parallel implementations for the stream compaction problem on a very high-performance but power-hungry 18-core Intel Xeon E5-2695 v4 multicore processor, obtaining that the Odroid C2 SDC cluster constitutes a much more efficient alternative when both resulting execution time and required energy are taken into account. Finally, we also implement and evaluate a parallel version of the stream split problem to store also the invalid elements after the valid ones. Our implementation shows good scalability on the Odroid C2 SDC cluster and more compensated computation/communication ratio when compared to the stream compaction problem.
url	http://dx.doi.org/10.1155/2018/2037272
work_keys_str_mv	AT gregoriobernabe ontheparallelizationofstreamcompactiononalowcostsdccluster AT manueleacacio ontheparallelizationofstreamcompactiononalowcostsdccluster
_version_	1721342510805024768

On the Parallelization of Stream Compaction on a Low-Cost SDC Cluster

Similar Items