Accelerating BFS via Data Structure-Aware Prefetching on GPU

Breadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory...

Full description

Bibliographic Details
Main Authors:	Hui Guo, Libo Huang, Yashuai Lu, Jianqiao Ma, Cheng Qian, Sheng Ma, Zhiying Wang
Format:	Article
Language:	English
Published:	IEEE 2018-01-01
Series:	IEEE Access
Subjects:	Accelerator architectures breadth first search data structure aware GPGPU computing prefetching mechanism irregular memory access
Online Access:	https://ieeexplore.ieee.org/document/8493153/

id	doaj-72747cd207e14dc8a25a142ed0ebf335
record_format	Article
spelling	doaj-72747cd207e14dc8a25a142ed0ebf3352021-03-29T21:32:58ZengIEEEIEEE Access2169-35362018-01-016602346024810.1109/ACCESS.2018.28762018493153Accelerating BFS via Data Structure-Aware Prefetching on GPUHui Guo0https://orcid.org/0000-0001-5131-0437Libo Huang1Yashuai Lu2Jianqiao Ma3Cheng Qian4Sheng Ma5https://orcid.org/0000-0003-1710-4060Zhiying Wang61National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, ChinaSpace Engineering University, Beijing, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, ChinaBreadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory access pattern of BFS, a great number of the memory divergent accesses harm the efficiency of GPU dramatically. Data prefetching can fetch useful data into the on-chip memory in advance to reduce the latency of accessing the off-chip memory. However, traditional prefetching techniques on GPU cannot deal with irregular memory accesses efficiently. By analyzing BFS algorithms for GPU, we find an opportunity to design an efficient prefetching mechanism by using the explicit information of the graph data structure. In this paper, we propose DSAP, a data structure-aware prefetcher on GPU that generates prefetching requests based on the well-defined data structure access pattern of BFS. Also, we introduce an adaptive fine-grain prefetching management to adjust the status of the prefetching granularity dynamically to balance the cache resource contention and data prefetching based on the utilization of the prefetched data. We implement DSAP on a GPGPU-sim simulator and evaluate six data sets from three different kinds of applications. DSAP can achieve a geometrical mean IPC improvement of 28%, up to 48.4%, compared with that of GPU with no prefetching technique, while in contrast, a stride-based global history buffer prefetching mechanism makes no effects on improving BFS performance for these data sets. Also, we use the GPUWattch to estimate the power consumption, and the power increases 8.3% in average and up to 11.8%, but the total energy cost drops 15.1% in average.https://ieeexplore.ieee.org/document/8493153/Accelerator architecturesbreadth first searchdata structure awareGPGPU computingprefetching mechanismirregular memory access
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hui Guo Libo Huang Yashuai Lu Jianqiao Ma Cheng Qian Sheng Ma Zhiying Wang
spellingShingle	Hui Guo Libo Huang Yashuai Lu Jianqiao Ma Cheng Qian Sheng Ma Zhiying Wang Accelerating BFS via Data Structure-Aware Prefetching on GPU IEEE Access Accelerator architectures breadth first search data structure aware GPGPU computing prefetching mechanism irregular memory access
author_facet	Hui Guo Libo Huang Yashuai Lu Jianqiao Ma Cheng Qian Sheng Ma Zhiying Wang
author_sort	Hui Guo
title	Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_short	Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_full	Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_fullStr	Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_full_unstemmed	Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_sort	accelerating bfs via data structure-aware prefetching on gpu
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2018-01-01
description	Breadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory access pattern of BFS, a great number of the memory divergent accesses harm the efficiency of GPU dramatically. Data prefetching can fetch useful data into the on-chip memory in advance to reduce the latency of accessing the off-chip memory. However, traditional prefetching techniques on GPU cannot deal with irregular memory accesses efficiently. By analyzing BFS algorithms for GPU, we find an opportunity to design an efficient prefetching mechanism by using the explicit information of the graph data structure. In this paper, we propose DSAP, a data structure-aware prefetcher on GPU that generates prefetching requests based on the well-defined data structure access pattern of BFS. Also, we introduce an adaptive fine-grain prefetching management to adjust the status of the prefetching granularity dynamically to balance the cache resource contention and data prefetching based on the utilization of the prefetched data. We implement DSAP on a GPGPU-sim simulator and evaluate six data sets from three different kinds of applications. DSAP can achieve a geometrical mean IPC improvement of 28%, up to 48.4%, compared with that of GPU with no prefetching technique, while in contrast, a stride-based global history buffer prefetching mechanism makes no effects on improving BFS performance for these data sets. Also, we use the GPUWattch to estimate the power consumption, and the power increases 8.3% in average and up to 11.8%, but the total energy cost drops 15.1% in average.
topic	Accelerator architectures breadth first search data structure aware GPGPU computing prefetching mechanism irregular memory access
url	https://ieeexplore.ieee.org/document/8493153/
work_keys_str_mv	AT huiguo acceleratingbfsviadatastructureawareprefetchingongpu AT libohuang acceleratingbfsviadatastructureawareprefetchingongpu AT yashuailu acceleratingbfsviadatastructureawareprefetchingongpu AT jianqiaoma acceleratingbfsviadatastructureawareprefetchingongpu AT chengqian acceleratingbfsviadatastructureawareprefetchingongpu AT shengma acceleratingbfsviadatastructureawareprefetchingongpu AT zhiyingwang acceleratingbfsviadatastructureawareprefetchingongpu
_version_	1724192767366660096

Accelerating BFS via Data Structure-Aware Prefetching on GPU

Similar Items