Performance Prediction Model on HSA-Compatible General-Purpose GPU System

碩士 === 國立成功大學 === 電腦與通信工程研究所 === 104 === In this thesis, we present a memory subsystem of customized general purpose GPU architecture. For fast development, the C++ simulated architecture should be kept as light-weight while timing accurate at the same time. Since most parts of benchmark simulation...

Full description

Bibliographic Details
Main Authors:	Kuan-ChiehHsu, 許冠傑
Other Authors:	Chung-Ho Chen
Format:	Others
Language:	en_US
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/pja3jx

id	ndltd-TW-104NCKU5652043
record_format	oai_dc
spelling	ndltd-TW-104NCKU56520432019-05-15T22:54:11Z http://ndltd.ncl.edu.tw/handle/pja3jx Performance Prediction Model on HSA-Compatible General-Purpose GPU System HSA繪圖處理器之效能預測模型 Kuan-ChiehHsu 許冠傑碩士國立成功大學電腦與通信工程研究所 104 In this thesis, we present a memory subsystem of customized general purpose GPU architecture. For fast development, the C++ simulated architecture should be kept as light-weight while timing accurate at the same time. Since most parts of benchmark simulation time come from memory subsystem-related latencies. For example, the level one cache miss will trigger Network on Chip (NoC) traffic; the cache coherence and memory controller scheduling policy also affect the latency viewed by streaming multiprocessor in this GPGPU architecture. Also, we discuss the memory space partitioning methods in one following section including coarse grain and fine grain partitioning methods. As for NoC module, we adopted previous research in this work and discuss geometry features of chosen topology – Mesh structure for robust reason. Another contribution of this work is that two machine learning models are used for predicting architecture performance and depicting the performance trend across plenty of hardware configuration settings. We aim to guess a reasonable summit value in performance surface by the following procedures. First, kmeans algorithm clusters training benchmarks into determined number of clusters. The multi-class Support Vector Machine (SVM) model is latter trained to fit memory-related only features. During validation phase, testing benchmarks’ summit performance values are predicted by the result from training phase. Under eight clusters setting, 46.48% predicted cycle performance counts across all tested benchmarks are less than 10% error comparing to real performance values. By varying the number of clusters, up to 57.97% points are less than 10% errors. Also, we show that summit performance not necessary happen under maximum hardware resources. Some discussions point out the memory traffic issues that significantly drag down the execution speed of certain accessing patterns from benchmarks. Combined the mentioned contributions together, we aim to provide a reliable and accurate early stage simulation platform for future IC chip implementation in an efficient way. Chung-Ho Chen 陳中和 2016 學位論文 ; thesis 66 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 電腦與通信工程研究所 === 104 === In this thesis, we present a memory subsystem of customized general purpose GPU architecture. For fast development, the C++ simulated architecture should be kept as light-weight while timing accurate at the same time. Since most parts of benchmark simulation time come from memory subsystem-related latencies. For example, the level one cache miss will trigger Network on Chip (NoC) traffic; the cache coherence and memory controller scheduling policy also affect the latency viewed by streaming multiprocessor in this GPGPU architecture. Also, we discuss the memory space partitioning methods in one following section including coarse grain and fine grain partitioning methods. As for NoC module, we adopted previous research in this work and discuss geometry features of chosen topology – Mesh structure for robust reason. Another contribution of this work is that two machine learning models are used for predicting architecture performance and depicting the performance trend across plenty of hardware configuration settings. We aim to guess a reasonable summit value in performance surface by the following procedures. First, kmeans algorithm clusters training benchmarks into determined number of clusters. The multi-class Support Vector Machine (SVM) model is latter trained to fit memory-related only features. During validation phase, testing benchmarks’ summit performance values are predicted by the result from training phase. Under eight clusters setting, 46.48% predicted cycle performance counts across all tested benchmarks are less than 10% error comparing to real performance values. By varying the number of clusters, up to 57.97% points are less than 10% errors. Also, we show that summit performance not necessary happen under maximum hardware resources. Some discussions point out the memory traffic issues that significantly drag down the execution speed of certain accessing patterns from benchmarks. Combined the mentioned contributions together, we aim to provide a reliable and accurate early stage simulation platform for future IC chip implementation in an efficient way.
author2	Chung-Ho Chen
author_facet	Chung-Ho Chen Kuan-ChiehHsu 許冠傑
author	Kuan-ChiehHsu 許冠傑
spellingShingle	Kuan-ChiehHsu 許冠傑 Performance Prediction Model on HSA-Compatible General-Purpose GPU System
author_sort	Kuan-ChiehHsu
title	Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_short	Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_full	Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_fullStr	Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_full_unstemmed	Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_sort	performance prediction model on hsa-compatible general-purpose gpu system
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/pja3jx
work_keys_str_mv	AT kuanchiehhsu performancepredictionmodelonhsacompatiblegeneralpurposegpusystem AT xǔguānjié performancepredictionmodelonhsacompatiblegeneralpurposegpusystem AT kuanchiehhsu hsahuìtúchùlǐqìzhīxiàonéngyùcèmóxíng AT xǔguānjié hsahuìtúchùlǐqìzhīxiàonéngyùcèmóxíng
_version_	1719137580010700800

Performance Prediction Model on HSA-Compatible General-Purpose GPU System

Similar Items