Efficient graph algorithm execution on data-parallel architectures

Mechanisms for improving the execution efficiency of graph algorithms on Data-Parallel Architectures were proposed and identified. Execution of graph algorithms on GPGPU architectures, the prevalent data-parallel architectures was considered. Irregular and data dependent accesses in graph algorithms...

Full description

Bibliographic Details
Main Author:	Bangalore Lakshminarayana, Nagesh
Other Authors:	Kim, Hyesoon
Format:	Others
Language:	en_US
Published:	Georgia Institute of Technology 2015
Subjects:	Graph algorithms Data-parallel architectures GPGPU architectures Prefetching Cache hierarchy Inclusion property Cache bypass Fine-grained accesses BFS characterization
Online Access:	http://hdl.handle.net/1853/53058

id	ndltd-GATECH-oai-smartech.gatech.edu-1853-53058
record_format	oai_dc
spelling	ndltd-GATECH-oai-smartech.gatech.edu-1853-530582015-02-05T15:35:21ZEfficient graph algorithm execution on data-parallel architecturesBangalore Lakshminarayana, NageshGraph algorithmsData-parallel architecturesGPGPU architecturesPrefetchingCache hierarchyInclusion propertyCache bypassFine-grained accessesBFS characterizationMechanisms for improving the execution efficiency of graph algorithms on Data-Parallel Architectures were proposed and identified. Execution of graph algorithms on GPGPU architectures, the prevalent data-parallel architectures was considered. Irregular and data dependent accesses in graph algorithms were found to cause significant idle cycles in GPGPU cores. A prefetching mechanism that reduced the amount of idle cycles by prefetching a data-dependent access pattern found in graph algorithms was proposed. Storing prefetches in unused spare registers in addition to storing them in the cache was shown to be more effective by the prefetching mechanism. The design of the cache hierarchy for graph algorithms was explored. First, an exclusive cache hierarchy was shown to be beneficial at the cost of increased traffic; a region based exclusive cache hierarchy was shown to be similar in performance to an exclusive cache hierarchy while reducing on-chip traffic. Second, bypassing cache blocks at both the level one and level two caches was shown to be beneficial. Third, the use of fine-grained memory accesses (or cache sub-blocking) was shown to be beneficial. The combination of cache bypassing and fine-grained memory accesses was shown to be more beneficial than applying the two mechanisms individually. Finally, the impact of different implementation strategies on algorithm performance was evaluated for the breadth first search algorithm using different input graphs and heuristics to identify the best performing implementation for a given input graph were also discussed.Georgia Institute of TechnologyKim, Hyesoon2015-01-12T20:51:24Z2015-01-12T20:51:24Z2014-122014-11-17December 20142015-01-12T20:51:24ZDissertationapplication/pdfhttp://hdl.handle.net/1853/53058en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
topic	Graph algorithms Data-parallel architectures GPGPU architectures Prefetching Cache hierarchy Inclusion property Cache bypass Fine-grained accesses BFS characterization
spellingShingle	Graph algorithms Data-parallel architectures GPGPU architectures Prefetching Cache hierarchy Inclusion property Cache bypass Fine-grained accesses BFS characterization Bangalore Lakshminarayana, Nagesh Efficient graph algorithm execution on data-parallel architectures
description	Mechanisms for improving the execution efficiency of graph algorithms on Data-Parallel Architectures were proposed and identified. Execution of graph algorithms on GPGPU architectures, the prevalent data-parallel architectures was considered. Irregular and data dependent accesses in graph algorithms were found to cause significant idle cycles in GPGPU cores. A prefetching mechanism that reduced the amount of idle cycles by prefetching a data-dependent access pattern found in graph algorithms was proposed. Storing prefetches in unused spare registers in addition to storing them in the cache was shown to be more effective by the prefetching mechanism. The design of the cache hierarchy for graph algorithms was explored. First, an exclusive cache hierarchy was shown to be beneficial at the cost of increased traffic; a region based exclusive cache hierarchy was shown to be similar in performance to an exclusive cache hierarchy while reducing on-chip traffic. Second, bypassing cache blocks at both the level one and level two caches was shown to be beneficial. Third, the use of fine-grained memory accesses (or cache sub-blocking) was shown to be beneficial. The combination of cache bypassing and fine-grained memory accesses was shown to be more beneficial than applying the two mechanisms individually. Finally, the impact of different implementation strategies on algorithm performance was evaluated for the breadth first search algorithm using different input graphs and heuristics to identify the best performing implementation for a given input graph were also discussed.
author2	Kim, Hyesoon
author_facet	Kim, Hyesoon Bangalore Lakshminarayana, Nagesh
author	Bangalore Lakshminarayana, Nagesh
author_sort	Bangalore Lakshminarayana, Nagesh
title	Efficient graph algorithm execution on data-parallel architectures
title_short	Efficient graph algorithm execution on data-parallel architectures
title_full	Efficient graph algorithm execution on data-parallel architectures
title_fullStr	Efficient graph algorithm execution on data-parallel architectures
title_full_unstemmed	Efficient graph algorithm execution on data-parallel architectures
title_sort	efficient graph algorithm execution on data-parallel architectures
publisher	Georgia Institute of Technology
publishDate	2015
url	http://hdl.handle.net/1853/53058
work_keys_str_mv	AT bangalorelakshminarayananagesh efficientgraphalgorithmexecutionondataparallelarchitectures
_version_	1716729831668842496

Efficient graph algorithm execution on data-parallel architectures

Similar Items