Efficient reduction over threads
The increasing number of cores in both desktops and servers leads to a demand for efficient parallel algorithms. This project focuses on the fundamental collective operation reduce, which merges several arrays into one by applying a binary operation element wise. Several reduce algorithms are evalua...
Main Author: | |
---|---|
Format: | Others |
Language: | English |
Published: |
KTH, Teoretisk fysik
2011
|
Subjects: | |
Online Access: | http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-49818 |
id |
ndltd-UPSALLA1-oai-DiVA.org-kth-49818 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-UPSALLA1-oai-DiVA.org-kth-498182018-01-13T05:15:38ZEfficient reduction over threadsengFalkman, PatrikKTH, Teoretisk fysik2011Computer and Information SciencesData- och informationsvetenskapThe increasing number of cores in both desktops and servers leads to a demand for efficient parallel algorithms. This project focuses on the fundamental collective operation reduce, which merges several arrays into one by applying a binary operation element wise. Several reduce algorithms are evaluated in terms of performance and scalability and a novel algorithm is introduced that takes advantage of shared memory and exploits load imbalance. To do so, the concept of dynamic pair generation is introduced which implies constructing a binary reduce tree dynamically based on the order of thread arrival, where pairs are formed in a lock-free manner. We conclude that the dynamic algorithm, given enough spread in the arriving times, can outperform the reference algorithms for some or all array sizes. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-49818Trita-FYS, 0280-316X ; 57application/pdfinfo:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Computer and Information Sciences Data- och informationsvetenskap |
spellingShingle |
Computer and Information Sciences Data- och informationsvetenskap Falkman, Patrik Efficient reduction over threads |
description |
The increasing number of cores in both desktops and servers leads to a demand for efficient parallel algorithms. This project focuses on the fundamental collective operation reduce, which merges several arrays into one by applying a binary operation element wise. Several reduce algorithms are evaluated in terms of performance and scalability and a novel algorithm is introduced that takes advantage of shared memory and exploits load imbalance. To do so, the concept of dynamic pair generation is introduced which implies constructing a binary reduce tree dynamically based on the order of thread arrival, where pairs are formed in a lock-free manner. We conclude that the dynamic algorithm, given enough spread in the arriving times, can outperform the reference algorithms for some or all array sizes. |
author |
Falkman, Patrik |
author_facet |
Falkman, Patrik |
author_sort |
Falkman, Patrik |
title |
Efficient reduction over threads |
title_short |
Efficient reduction over threads |
title_full |
Efficient reduction over threads |
title_fullStr |
Efficient reduction over threads |
title_full_unstemmed |
Efficient reduction over threads |
title_sort |
efficient reduction over threads |
publisher |
KTH, Teoretisk fysik |
publishDate |
2011 |
url |
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-49818 |
work_keys_str_mv |
AT falkmanpatrik efficientreductionoverthreads |
_version_ |
1718608424526151680 |