A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have...

Full description

Bibliographic Details
Main Authors:	Guixia He, Jiaquan Gao
Format:	Article
Language:	English
Published:	Hindawi Limited 2016-01-01
Series:	Mathematical Problems in Engineering
Online Access:	http://dx.doi.org/10.1155/2016/8471283

id	doaj-23eb02ca2ce7404fa6e615a9caef050b
record_format	Article
spelling	doaj-23eb02ca2ce7404fa6e615a9caef050b2020-11-24T23:59:39ZengHindawi LimitedMathematical Problems in Engineering1024-123X1563-51472016-01-01201610.1155/2016/84712838471283A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUsGuixia He0Jiaquan Gao1Zhijiang College, Zhejiang University of Technology, Hangzhou 310024, ChinaCollege of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, ChinaSparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.http://dx.doi.org/10.1155/2016/8471283
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Guixia He Jiaquan Gao
spellingShingle	Guixia He Jiaquan Gao A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs Mathematical Problems in Engineering
author_facet	Guixia He Jiaquan Gao
author_sort	Guixia He
title	A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs
title_short	A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs
title_full	A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs
title_fullStr	A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs
title_full_unstemmed	A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs
title_sort	novel csr-based sparse matrix-vector multiplication on gpus
publisher	Hindawi Limited
series	Mathematical Problems in Engineering
issn	1024-123X 1563-5147
publishDate	2016-01-01
description	Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing) and CSR-vector (partial coalescing). Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.
url	http://dx.doi.org/10.1155/2016/8471283
work_keys_str_mv	AT guixiahe anovelcsrbasedsparsematrixvectormultiplicationongpus AT jiaquangao anovelcsrbasedsparsematrixvectormultiplicationongpus AT guixiahe novelcsrbasedsparsematrixvectormultiplicationongpus AT jiaquangao novelcsrbasedsparsematrixvectormultiplicationongpus
_version_	1725446964995162112

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Similar Items