Efficient reduction over threads

The increasing number of cores in both desktops and servers leads to a demand for efficient parallel algorithms. This project focuses on the fundamental collective operation reduce, which merges several arrays into one by applying a binary operation element wise. Several reduce algorithms are evalua...

Full description

Bibliographic Details
Main Author: Falkman, Patrik
Format: Others
Language:English
Published: KTH, Teoretisk fysik 2011
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-49818
id ndltd-UPSALLA1-oai-DiVA.org-kth-49818
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-kth-498182018-01-13T05:15:38ZEfficient reduction over threadsengFalkman, PatrikKTH, Teoretisk fysik2011Computer and Information SciencesData- och informationsvetenskapThe increasing number of cores in both desktops and servers leads to a demand for efficient parallel algorithms. This project focuses on the fundamental collective operation reduce, which merges several arrays into one by applying a binary operation element wise. Several reduce algorithms are evaluated in terms of performance and scalability and a novel algorithm is introduced that takes advantage of shared memory and exploits load imbalance. To do so, the concept of dynamic pair generation is introduced which implies constructing a binary reduce tree dynamically based on the order of thread arrival, where pairs are formed in a lock-free manner. We conclude that the dynamic algorithm, given enough spread in the arriving times, can outperform the reference algorithms for some or all array sizes. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-49818Trita-FYS, 0280-316X ; 57application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
topic Computer and Information Sciences
Data- och informationsvetenskap
spellingShingle Computer and Information Sciences
Data- och informationsvetenskap
Falkman, Patrik
Efficient reduction over threads
description The increasing number of cores in both desktops and servers leads to a demand for efficient parallel algorithms. This project focuses on the fundamental collective operation reduce, which merges several arrays into one by applying a binary operation element wise. Several reduce algorithms are evaluated in terms of performance and scalability and a novel algorithm is introduced that takes advantage of shared memory and exploits load imbalance. To do so, the concept of dynamic pair generation is introduced which implies constructing a binary reduce tree dynamically based on the order of thread arrival, where pairs are formed in a lock-free manner. We conclude that the dynamic algorithm, given enough spread in the arriving times, can outperform the reference algorithms for some or all array sizes.
author Falkman, Patrik
author_facet Falkman, Patrik
author_sort Falkman, Patrik
title Efficient reduction over threads
title_short Efficient reduction over threads
title_full Efficient reduction over threads
title_fullStr Efficient reduction over threads
title_full_unstemmed Efficient reduction over threads
title_sort efficient reduction over threads
publisher KTH, Teoretisk fysik
publishDate 2011
url http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-49818
work_keys_str_mv AT falkmanpatrik efficientreductionoverthreads
_version_ 1718608424526151680