Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit

We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at...

Full description

Bibliographic Details
Main Author: Delorme, Michael Christopher
Other Authors: Abdelrahman, Tarek S.
Language:en_ca
Published: 2013
Subjects:
GPU
APU
Online Access:http://hdl.handle.net/1807/35116
id ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-35116
record_format oai_dc
spelling ndltd-TORONTO-oai-tspace.library.utoronto.ca-1807-351162013-04-19T20:04:05ZParallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing UnitDelorme, Michael ChristopherParallel sortingRadix sortHeterogeneous computingGPUGPGPUAMD FusionLlanoAPUAccelerated Processing UnitOpenCLFusion SortGPU computing0984We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU. Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the benefit of the APU’s integrated architecture. This performance benefit is hindered by limitations in the APU’s architecture and programming model. We believe that the performance benefits will increase once these limitations are addressed in future generations of the APU.Abdelrahman, Tarek S.2013-032013-03-18T15:49:36ZNO_RESTRICTION2013-03-18T15:49:36Z2013-03-18Thesishttp://hdl.handle.net/1807/35116en_ca
collection NDLTD
language en_ca
sources NDLTD
topic Parallel sorting
Radix sort
Heterogeneous computing
GPU
GPGPU
AMD Fusion
Llano
APU
Accelerated Processing Unit
OpenCL
Fusion Sort
GPU computing
0984
spellingShingle Parallel sorting
Radix sort
Heterogeneous computing
GPU
GPGPU
AMD Fusion
Llano
APU
Accelerated Processing Unit
OpenCL
Fusion Sort
GPU computing
0984
Delorme, Michael Christopher
Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit
description We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU. Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the benefit of the APU’s integrated architecture. This performance benefit is hindered by limitations in the APU’s architecture and programming model. We believe that the performance benefits will increase once these limitations are addressed in future generations of the APU.
author2 Abdelrahman, Tarek S.
author_facet Abdelrahman, Tarek S.
Delorme, Michael Christopher
author Delorme, Michael Christopher
author_sort Delorme, Michael Christopher
title Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit
title_short Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit
title_full Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit
title_fullStr Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit
title_full_unstemmed Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit
title_sort parallel sorting on the heterogeneous amd fusion accelerated processing unit
publishDate 2013
url http://hdl.handle.net/1807/35116
work_keys_str_mv AT delormemichaelchristopher parallelsortingontheheterogeneousamdfusionacceleratedprocessingunit
_version_ 1716582695597768704