Summary: | 碩士 === 長庚大學 === 資訊工程學系 === 101 === Sparse matrix is used in a large number of important scientific codes, such as molecular dynamics, finite-element methods, and climate modeling. Much research has proposed several techniques to improve the performance for the sparse matrix operations based on the GPU. However, there is no efficient method for compressing sparse matrix on GPU. Thence, in this paper, we design a strategy to efficiently compress sparse matrices by using three data distribution strategies based on the concept of GPU. Different data distribution strategies may lead to different performance. Therefore, how to select a distribution scheme is an important issue. Based on the SFCg, the CFSg, the EDg strategies, we discovered the compression sparse matrix that runs on the device could encounter some prefix sum (PS) problems under the SIMT architecture. Moreover, we propose two other type of prefix sum, horizontal prefix sum (HPS) and vertical prefix sum (VPS) in order to solve the compression problem for our method by performing compression under the use of GPU. Both theoretical analysis and experimental tests were conducted. In the theoretical analysis, we analyze the time costs of PS, HPS, and VPS. We also analyze the SFCg, CFSg, and EDg strategies in terms of CPU computing time, data transferring time, and GPU computing time. In the experimental tests, we implemented these three strategies on the Tesla C2050. From the experimental results, the SFCg strategy outperforms the CFSg and EDg strategies. Moreover, the SFCg strategy achieves about 14x speedup ratios by comparing with the compression method on CPU.
|