Accelerating BFS via Data Structure-Aware Prefetching on GPU
Breadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8493153/ |
id |
doaj-72747cd207e14dc8a25a142ed0ebf335 |
---|---|
record_format |
Article |
spelling |
doaj-72747cd207e14dc8a25a142ed0ebf3352021-03-29T21:32:58ZengIEEEIEEE Access2169-35362018-01-016602346024810.1109/ACCESS.2018.28762018493153Accelerating BFS via Data Structure-Aware Prefetching on GPUHui Guo0https://orcid.org/0000-0001-5131-0437Libo Huang1Yashuai Lu2Jianqiao Ma3Cheng Qian4Sheng Ma5https://orcid.org/0000-0003-1710-4060Zhiying Wang61National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, ChinaSpace Engineering University, Beijing, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, China1National University of Defense Technology, Changsha, ChinaBreadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory access pattern of BFS, a great number of the memory divergent accesses harm the efficiency of GPU dramatically. Data prefetching can fetch useful data into the on-chip memory in advance to reduce the latency of accessing the off-chip memory. However, traditional prefetching techniques on GPU cannot deal with irregular memory accesses efficiently. By analyzing BFS algorithms for GPU, we find an opportunity to design an efficient prefetching mechanism by using the explicit information of the graph data structure. In this paper, we propose DSAP, a data structure-aware prefetcher on GPU that generates prefetching requests based on the well-defined data structure access pattern of BFS. Also, we introduce an adaptive fine-grain prefetching management to adjust the status of the prefetching granularity dynamically to balance the cache resource contention and data prefetching based on the utilization of the prefetched data. We implement DSAP on a GPGPU-sim simulator and evaluate six data sets from three different kinds of applications. DSAP can achieve a geometrical mean IPC improvement of 28%, up to 48.4%, compared with that of GPU with no prefetching technique, while in contrast, a stride-based global history buffer prefetching mechanism makes no effects on improving BFS performance for these data sets. Also, we use the GPUWattch to estimate the power consumption, and the power increases 8.3% in average and up to 11.8%, but the total energy cost drops 15.1% in average.https://ieeexplore.ieee.org/document/8493153/Accelerator architecturesbreadth first searchdata structure awareGPGPU computingprefetching mechanismirregular memory access |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hui Guo Libo Huang Yashuai Lu Jianqiao Ma Cheng Qian Sheng Ma Zhiying Wang |
spellingShingle |
Hui Guo Libo Huang Yashuai Lu Jianqiao Ma Cheng Qian Sheng Ma Zhiying Wang Accelerating BFS via Data Structure-Aware Prefetching on GPU IEEE Access Accelerator architectures breadth first search data structure aware GPGPU computing prefetching mechanism irregular memory access |
author_facet |
Hui Guo Libo Huang Yashuai Lu Jianqiao Ma Cheng Qian Sheng Ma Zhiying Wang |
author_sort |
Hui Guo |
title |
Accelerating BFS via Data Structure-Aware Prefetching on GPU |
title_short |
Accelerating BFS via Data Structure-Aware Prefetching on GPU |
title_full |
Accelerating BFS via Data Structure-Aware Prefetching on GPU |
title_fullStr |
Accelerating BFS via Data Structure-Aware Prefetching on GPU |
title_full_unstemmed |
Accelerating BFS via Data Structure-Aware Prefetching on GPU |
title_sort |
accelerating bfs via data structure-aware prefetching on gpu |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2018-01-01 |
description |
Breadth First Search (BFS) is a key graph traversing algorithm for many graph analytics applications. In recent decades, as the scale of the graph analytics problem has become larger and larger, it has raised many interests to accelerate graph traversing on GPU. However, due to the irregular memory access pattern of BFS, a great number of the memory divergent accesses harm the efficiency of GPU dramatically. Data prefetching can fetch useful data into the on-chip memory in advance to reduce the latency of accessing the off-chip memory. However, traditional prefetching techniques on GPU cannot deal with irregular memory accesses efficiently. By analyzing BFS algorithms for GPU, we find an opportunity to design an efficient prefetching mechanism by using the explicit information of the graph data structure. In this paper, we propose DSAP, a data structure-aware prefetcher on GPU that generates prefetching requests based on the well-defined data structure access pattern of BFS. Also, we introduce an adaptive fine-grain prefetching management to adjust the status of the prefetching granularity dynamically to balance the cache resource contention and data prefetching based on the utilization of the prefetched data. We implement DSAP on a GPGPU-sim simulator and evaluate six data sets from three different kinds of applications. DSAP can achieve a geometrical mean IPC improvement of 28%, up to 48.4%, compared with that of GPU with no prefetching technique, while in contrast, a stride-based global history buffer prefetching mechanism makes no effects on improving BFS performance for these data sets. Also, we use the GPUWattch to estimate the power consumption, and the power increases 8.3% in average and up to 11.8%, but the total energy cost drops 15.1% in average. |
topic |
Accelerator architectures breadth first search data structure aware GPGPU computing prefetching mechanism irregular memory access |
url |
https://ieeexplore.ieee.org/document/8493153/ |
work_keys_str_mv |
AT huiguo acceleratingbfsviadatastructureawareprefetchingongpu AT libohuang acceleratingbfsviadatastructureawareprefetchingongpu AT yashuailu acceleratingbfsviadatastructureawareprefetchingongpu AT jianqiaoma acceleratingbfsviadatastructureawareprefetchingongpu AT chengqian acceleratingbfsviadatastructureawareprefetchingongpu AT shengma acceleratingbfsviadatastructureawareprefetchingongpu AT zhiyingwang acceleratingbfsviadatastructureawareprefetchingongpu |
_version_ |
1724192767366660096 |