CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System

碩士 === 國立臺灣大學 === 數學研究所 === 100 === Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree fo...

Full description

Bibliographic Details
Main Authors:	Chenhan D. Yu, 余承翰
Other Authors:	Weichung Wang
Format:	Others
Language:	en_US
Published:	2012
Online Access:	http://ndltd.ncl.edu.tw/handle/01116809690497788866

id	ndltd-TW-100NTU05479012
record_format	oai_dc
spelling	ndltd-TW-100NTU054790122015-10-13T21:50:16Z http://ndltd.ncl.edu.tw/handle/01116809690497788866 CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System 使用多子矩陣法結合中央處理器和圖形處理器解決大型稀疏線性系統 Chenhan D. Yu 余承翰碩士國立臺灣大學數學研究所 100 Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree for unsymmetric problems) to transform a large sparse linear system problem into many smaller dense frontal operations, suitable for hybrid CPU-GPU systems. We analyze the method from both algorithmic and implementation perspectives to see how a GPU or more GPUs can be used to accelerate the computations, and review the multifrontal method. Problems are studied from symmetric positive definite (SPD), symmetric indefinite to unsymmetric cases. We successfully carry the ideal implementation out SPD multifrontal which provides nearly peak performance as dense BLAS3 routines on GPU, MAGMA’s Cholesky, and the same symmetric property accounts for the similar implementation and performance for symmetric indefinite problems. However, unsymmetric problems can be hard to implement due to the runtime column pivoting which separates 2 BLAS3 routines into several BLAS2 routines; extra communications are also inevitable. In order to handle the communication between CPU and GPU, easily slowing down the performance, several strategies are provided in the article for all kinds of multifrontal problem to reducing the communication and accelerating the process. Further more, we analyze the total execution time of SPD problem, providing nearly optimal workload distribution for hybrid CPU-GPU cooperation. For all kinds of problem, scalable algorithm are provided to adapt more GPUs, up to 4. By avoiding the analysis of communication of cluster network which is a different scale from PCI-E communication speed, we focus on adapting the optimization strategies on a box server. The extension and analysis of optimization model for the new parallel scheme is quite the same as the single GPU model. New factorization cost and communication cost for multiple GPUs are added to the model, and the performance bound and properties are still hold. Weichung Wang 王偉仲 2012 學位論文 ; thesis 105 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 數學研究所 === 100 === Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree for unsymmetric problems) to transform a large sparse linear system problem into many smaller dense frontal operations, suitable for hybrid CPU-GPU systems. We analyze the method from both algorithmic and implementation perspectives to see how a GPU or more GPUs can be used to accelerate the computations, and review the multifrontal method. Problems are studied from symmetric positive definite (SPD), symmetric indefinite to unsymmetric cases. We successfully carry the ideal implementation out SPD multifrontal which provides nearly peak performance as dense BLAS3 routines on GPU, MAGMA’s Cholesky, and the same symmetric property accounts for the similar implementation and performance for symmetric indefinite problems. However, unsymmetric problems can be hard to implement due to the runtime column pivoting which separates 2 BLAS3 routines into several BLAS2 routines; extra communications are also inevitable. In order to handle the communication between CPU and GPU, easily slowing down the performance, several strategies are provided in the article for all kinds of multifrontal problem to reducing the communication and accelerating the process. Further more, we analyze the total execution time of SPD problem, providing nearly optimal workload distribution for hybrid CPU-GPU cooperation. For all kinds of problem, scalable algorithm are provided to adapt more GPUs, up to 4. By avoiding the analysis of communication of cluster network which is a different scale from PCI-E communication speed, we focus on adapting the optimization strategies on a box server. The extension and analysis of optimization model for the new parallel scheme is quite the same as the single GPU model. New factorization cost and communication cost for multiple GPUs are added to the model, and the performance bound and properties are still hold.
author2	Weichung Wang
author_facet	Weichung Wang Chenhan D. Yu 余承翰
author	Chenhan D. Yu 余承翰
spellingShingle	Chenhan D. Yu 余承翰 CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
author_sort	Chenhan D. Yu
title	CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_short	CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_full	CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_fullStr	CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_full_unstemmed	CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
title_sort	cpu-gpu hybrid approaches in multifrontal methods for large and sparse linear system
publishDate	2012
url	http://ndltd.ncl.edu.tw/handle/01116809690497788866
work_keys_str_mv	AT chenhandyu cpugpuhybridapproachesinmultifrontalmethodsforlargeandsparselinearsystem AT yúchénghàn cpugpuhybridapproachesinmultifrontalmethodsforlargeandsparselinearsystem AT chenhandyu shǐyòngduōzijǔzhènfǎjiéhézhōngyāngchùlǐqìhétúxíngchùlǐqìjiějuédàxíngxīshūxiànxìngxìtǒng AT yúchénghàn shǐyòngduōzijǔzhènfǎjiéhézhōngyāngchùlǐqìhétúxíngchùlǐqìjiějuédàxíngxīshūxiànxìngxìtǒng
_version_	1718068552243609600

CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System

Similar Items