Accelerating BFS via Data Structure-Aware Prefetching on GPU

Breadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory...

Full description

Bibliographic Details
Main Authors: Hui Guo, Libo Huang, Yashuai Lu, Jianqiao Ma, Cheng Qian, Sheng Ma, Zhiying Wang
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8493153/
id doaj-72747cd207e14dc8a25a142ed0ebf335
record_format Article
spelling doaj-72747cd207e14dc8a25a142ed0ebf3352021-03-29T21:32:58ZengIEEEIEEE Access2169-35362018-01-016602346024810.1109/ACCESS.2018.28762018493153Accelerating BFS via Data Structure-Aware Prefetching on GPUHui Guo0https://orcid.org/0000-0001-5131-0437Libo Huang1Yashuai Lu2Jianqiao Ma3Cheng Qian4Sheng Ma5https://orcid.org/0000-0003-1710-4060Zhiying Wang61National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, ChinaSpace Engineering University, Beijing, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, ChinaBreadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory access pattern of BFS, a great number of the memory divergent accesses harm the efficiency of GPU dramatically. Data prefetching can fetch useful data into the on-chip memory in advance to reduce the latency of accessing the off-chip memory. However, traditional prefetching techniques on GPU cannot deal with irregular memory accesses efficiently. By analyzing BFS algorithms for GPU, we find an opportunity to design an efficient prefetching mechanism by using the explicit information of the graph data structure. In this paper, we propose DSAP, a data structure-aware prefetcher on GPU that generates prefetching requests based on the well-defined data structure access pattern of BFS. Also, we introduce an adaptive fine-grain prefetching management to adjust the status of the prefetching granularity dynamically to balance the cache resource contention and data prefetching based on the utilization of the prefetched data. We implement DSAP on a GPGPU-sim simulator and evaluate six data sets from three different kinds of applications. DSAP can achieve a geometrical mean IPC improvement of 28%, up to 48.4%, compared with that of GPU with no prefetching technique, while in contrast, a stride-based global history buffer prefetching mechanism makes no effects on improving BFS performance for these data sets. Also, we use the GPUWattch to estimate the power consumption, and the power increases 8.3% in average and up to 11.8%, but the total energy cost drops 15.1% in average.https://ieeexplore.ieee.org/document/8493153/Accelerator architecturesbreadth first searchdata structure awareGPGPU computingprefetching mechanismirregular memory access
collection DOAJ
language English
format Article
sources DOAJ
author Hui Guo
Libo Huang
Yashuai Lu
Jianqiao Ma
Cheng Qian
Sheng Ma
Zhiying Wang
spellingShingle Hui Guo
Libo Huang
Yashuai Lu
Jianqiao Ma
Cheng Qian
Sheng Ma
Zhiying Wang
Accelerating BFS via Data Structure-Aware Prefetching on GPU
IEEE Access
Accelerator architectures
breadth first search
data structure aware
GPGPU computing
prefetching mechanism
irregular memory access
author_facet Hui Guo
Libo Huang
Yashuai Lu
Jianqiao Ma
Cheng Qian
Sheng Ma
Zhiying Wang
author_sort Hui Guo
title Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_short Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_full Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_fullStr Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_full_unstemmed Accelerating BFS via Data Structure-Aware Prefetching on GPU
title_sort accelerating bfs via data structure-aware prefetching on gpu
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description Breadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory access pattern of BFS, a great number of the memory divergent accesses harm the efficiency of GPU dramatically. Data prefetching can fetch useful data into the on-chip memory in advance to reduce the latency of accessing the off-chip memory. However, traditional prefetching techniques on GPU cannot deal with irregular memory accesses efficiently. By analyzing BFS algorithms for GPU, we find an opportunity to design an efficient prefetching mechanism by using the explicit information of the graph data structure. In this paper, we propose DSAP, a data structure-aware prefetcher on GPU that generates prefetching requests based on the well-defined data structure access pattern of BFS. Also, we introduce an adaptive fine-grain prefetching management to adjust the status of the prefetching granularity dynamically to balance the cache resource contention and data prefetching based on the utilization of the prefetched data. We implement DSAP on a GPGPU-sim simulator and evaluate six data sets from three different kinds of applications. DSAP can achieve a geometrical mean IPC improvement of 28%, up to 48.4%, compared with that of GPU with no prefetching technique, while in contrast, a stride-based global history buffer prefetching mechanism makes no effects on improving BFS performance for these data sets. Also, we use the GPUWattch to estimate the power consumption, and the power increases 8.3% in average and up to 11.8%, but the total energy cost drops 15.1% in average.
topic Accelerator architectures
breadth first search
data structure aware
GPGPU computing
prefetching mechanism
irregular memory access
url https://ieeexplore.ieee.org/document/8493153/
work_keys_str_mv AT huiguo acceleratingbfsviadatastructureawareprefetchingongpu
AT libohuang acceleratingbfsviadatastructureawareprefetchingongpu
AT yashuailu acceleratingbfsviadatastructureawareprefetchingongpu
AT jianqiaoma acceleratingbfsviadatastructureawareprefetchingongpu
AT chengqian acceleratingbfsviadatastructureawareprefetchingongpu
AT shengma acceleratingbfsviadatastructureawareprefetchingongpu
AT zhiyingwang acceleratingbfsviadatastructureawareprefetchingongpu
_version_ 1724192767366660096