CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System

碩士 === 國立臺灣大學 === 數學研究所 === 100 === Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree fo...

Full description

Bibliographic Details
Main Authors: Chenhan D. Yu, 余承翰
Other Authors: Weichung Wang
Format: Others
Language:en_US
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/01116809690497788866
id ndltd-TW-100NTU05479012
record_format oai_dc
spelling ndltd-TW-100NTU054790122015-10-13T21:50:16Z http://ndltd.ncl.edu.tw/handle/01116809690497788866 CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System 使用多子矩陣法結合中央處理器和圖形處理器解決大型稀疏線性系統 Chenhan D. Yu 余承翰 碩士 國立臺灣大學 數學研究所 100 Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree for unsymmetric problems) to transform a large sparse linear system problem into many smaller dense frontal operations, suitable for hybrid CPU-GPU systems. We analyze the method from both algorithmic and implementation perspectives to see how a GPU or more GPUs can be used to accelerate the computations, and review the multifrontal method. Problems are studied from symmetric positive definite (SPD), symmetric indefinite to unsymmetric cases. We successfully carry the ideal implementation out SPD multifrontal which provides nearly peak performance as dense BLAS3 routines on GPU, MAGMA’s Cholesky, and the same symmetric property accounts for the similar implementation and performance for symmetric indefinite problems. However, unsymmetric problems can be hard to implement due to the runtime column pivoting which separates 2 BLAS3 routines into several BLAS2 routines; extra communications are also inevitable. In order to handle the communication between CPU and GPU, easily slowing down the performance, several strategies are provided in the article for all kinds of multifrontal problem to reducing the communication and accelerating the process. Further more, we analyze the total execution time of SPD problem, providing nearly optimal workload distribution for hybrid CPU-GPU cooperation. For all kinds of problem, scalable algorithm are provided to adapt more GPUs, up to 4. By avoiding the analysis of communication of cluster network which is a different scale from PCI-E communication speed, we focus on adapting the optimization strategies on a box server. The extension and analysis of optimization model for the new parallel scheme is quite the same as the single GPU model. New factorization cost and communication cost for multiple GPUs are added to the model, and the performance bound and properties are still hold. Weichung Wang 王偉仲 2012 學位論文 ; thesis 105 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 數學研究所 === 100 === Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree for unsymmetric problems) to transform a large sparse linear system problem into many smaller dense frontal operations, suitable for hybrid CPU-GPU systems. We analyze the method from both algorithmic and implementation perspectives to see how a GPU or more GPUs can be used to accelerate the computations, and review the multifrontal method. Problems are studied from symmetric positive definite (SPD), symmetric indefinite to unsymmetric cases. We successfully carry the ideal implementation out SPD multifrontal which provides nearly peak performance as dense BLAS3 routines on GPU, MAGMA’s Cholesky, and the same symmetric property accounts for the similar implementation and performance for symmetric indefinite problems. However, unsymmetric problems can be hard to implement due to the runtime column pivoting which separates 2 BLAS3 routines into several BLAS2 routines; extra communications are also inevitable. In order to handle the communication between CPU and GPU, easily slowing down the performance, several strategies are provided in the article for all kinds of multifrontal problem to reducing the communication and accelerating the process. Further more, we analyze the total execution time of SPD problem, providing nearly optimal workload distribution for hybrid CPU-GPU cooperation. For all kinds of problem, scalable algorithm are provided to adapt more GPUs, up to 4. By avoiding the analysis of communication of cluster network which is a different scale from PCI-E communication speed, we focus on adapting the optimization strategies on a box server. The extension and analysis of optimization model for the new parallel scheme is quite the same as the single GPU model. New factorization cost and communication cost for multiple GPUs are added to the model, and the performance bound and properties are still hold.
author2 Weichung Wang
author_facet Weichung Wang
Chenhan D. Yu
余承翰
author Chenhan D. Yu
余承翰
spellingShingle Chenhan D. Yu
余承翰
CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
author_sort Chenhan D. Yu
title CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_short CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_full CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_fullStr CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_full_unstemmed CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_sort cpu-gpu hybrid approaches in multifrontal methods for large and sparse linear system
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/01116809690497788866
work_keys_str_mv AT chenhandyu cpugpuhybridapproachesinmultifrontalmethodsforlargeandsparselinearsystem
AT yúchénghàn cpugpuhybridapproachesinmultifrontalmethodsforlargeandsparselinearsystem
AT chenhandyu shǐyòngduōzijǔzhènfǎjiéhézhōngyāngchùlǐqìhétúxíngchùlǐqìjiějuédàxíngxīshūxiànxìngxìtǒng
AT yúchénghàn shǐyòngduōzijǔzhènfǎjiéhézhōngyāngchùlǐqìhétúxíngchùlǐqìjiějuédàxíngxīshūxiànxìngxìtǒng
_version_ 1718068552243609600