CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System
碩士 === 國立臺灣大學 === 數學研究所 === 100 === Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree fo...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/01116809690497788866 |
id |
ndltd-TW-100NTU05479012 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NTU054790122015-10-13T21:50:16Z http://ndltd.ncl.edu.tw/handle/01116809690497788866 CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System 使用多子矩陣法結合中央處理器和圖形處理器解決大型稀疏線性系統 Chenhan D. Yu 余承翰 碩士 國立臺灣大學 數學研究所 100 Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree for unsymmetric problems) to transform a large sparse linear system problem into many smaller dense frontal operations, suitable for hybrid CPU-GPU systems. We analyze the method from both algorithmic and implementation perspectives to see how a GPU or more GPUs can be used to accelerate the computations, and review the multifrontal method. Problems are studied from symmetric positive definite (SPD), symmetric indefinite to unsymmetric cases. We successfully carry the ideal implementation out SPD multifrontal which provides nearly peak performance as dense BLAS3 routines on GPU, MAGMA’s Cholesky, and the same symmetric property accounts for the similar implementation and performance for symmetric indefinite problems. However, unsymmetric problems can be hard to implement due to the runtime column pivoting which separates 2 BLAS3 routines into several BLAS2 routines; extra communications are also inevitable. In order to handle the communication between CPU and GPU, easily slowing down the performance, several strategies are provided in the article for all kinds of multifrontal problem to reducing the communication and accelerating the process. Further more, we analyze the total execution time of SPD problem, providing nearly optimal workload distribution for hybrid CPU-GPU cooperation. For all kinds of problem, scalable algorithm are provided to adapt more GPUs, up to 4. By avoiding the analysis of communication of cluster network which is a different scale from PCI-E communication speed, we focus on adapting the optimization strategies on a box server. The extension and analysis of optimization model for the new parallel scheme is quite the same as the single GPU model. New factorization cost and communication cost for multiple GPUs are added to the model, and the performance bound and properties are still hold. Weichung Wang 王偉仲 2012 學位論文 ; thesis 105 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 數學研究所 === 100 === Solving large-scale sparse linear systems is at the heart of various scientific and engineering computations. Among various direct methods, we focus on the multifrontal method in particular. A multifrontal method uses a elimination tree (column elimination tree for unsymmetric problems) to transform a large sparse linear system problem into many smaller dense frontal operations, suitable for hybrid CPU-GPU systems. We analyze the method from both algorithmic and implementation perspectives to see how a GPU or more GPUs can be used to accelerate the computations, and review the multifrontal method. Problems are studied from symmetric positive definite (SPD), symmetric indefinite to unsymmetric cases.
We successfully carry the ideal implementation out SPD multifrontal which provides nearly peak performance as dense BLAS3 routines on GPU, MAGMA’s Cholesky, and the same symmetric property accounts for the similar implementation and performance for symmetric indefinite problems. However, unsymmetric problems can be hard to implement due to the runtime column pivoting which separates 2 BLAS3 routines into several BLAS2 routines; extra communications are also inevitable. In order to handle the communication between CPU and GPU, easily slowing down the performance, several strategies are provided in the article for all kinds of multifrontal problem to reducing the communication and accelerating the process. Further more, we analyze the total execution time of SPD problem, providing nearly optimal workload distribution for hybrid CPU-GPU cooperation.
For all kinds of problem, scalable algorithm are provided to adapt more GPUs, up to 4. By avoiding the analysis of communication of cluster network which is a different scale from PCI-E communication speed, we focus on adapting the optimization strategies on a box server. The extension and analysis of optimization model for the new parallel scheme is quite the same as the single GPU model. New factorization cost and communication cost for multiple GPUs are added to the model, and the performance bound and properties are still hold.
|
author2 |
Weichung Wang |
author_facet |
Weichung Wang Chenhan D. Yu 余承翰 |
author |
Chenhan D. Yu 余承翰 |
spellingShingle |
Chenhan D. Yu 余承翰 CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System |
author_sort |
Chenhan D. Yu |
title |
CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System |
title_short |
CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System |
title_full |
CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System |
title_fullStr |
CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System |
title_full_unstemmed |
CPU-GPU Hybrid Approaches in Multifrontal Methods for Large and Sparse Linear System |
title_sort |
cpu-gpu hybrid approaches in multifrontal methods for large and sparse linear system |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/01116809690497788866 |
work_keys_str_mv |
AT chenhandyu cpugpuhybridapproachesinmultifrontalmethodsforlargeandsparselinearsystem AT yúchénghàn cpugpuhybridapproachesinmultifrontalmethodsforlargeandsparselinearsystem AT chenhandyu shǐyòngduōzijǔzhènfǎjiéhézhōngyāngchùlǐqìhétúxíngchùlǐqìjiějuédàxíngxīshūxiànxìngxìtǒng AT yúchénghàn shǐyòngduōzijǔzhènfǎjiéhézhōngyāngchùlǐqìhétúxíngchùlǐqìjiějuédàxíngxīshūxiànxìngxìtǒng |
_version_ |
1718068552243609600 |