Performance Prediction Model on HSA-Compatible General-Purpose GPU System

碩士 === 國立成功大學 === 電腦與通信工程研究所 === 104 === In this thesis, we present a memory subsystem of customized general purpose GPU architecture. For fast development, the C++ simulated architecture should be kept as light-weight while timing accurate at the same time. Since most parts of benchmark simulation...

Full description

Bibliographic Details
Main Authors: Kuan-ChiehHsu, 許冠傑
Other Authors: Chung-Ho Chen
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/pja3jx
id ndltd-TW-104NCKU5652043
record_format oai_dc
spelling ndltd-TW-104NCKU56520432019-05-15T22:54:11Z http://ndltd.ncl.edu.tw/handle/pja3jx Performance Prediction Model on HSA-Compatible General-Purpose GPU System HSA繪圖處理器之效能預測模型 Kuan-ChiehHsu 許冠傑 碩士 國立成功大學 電腦與通信工程研究所 104 In this thesis, we present a memory subsystem of customized general purpose GPU architecture. For fast development, the C++ simulated architecture should be kept as light-weight while timing accurate at the same time. Since most parts of benchmark simulation time come from memory subsystem-related latencies. For example, the level one cache miss will trigger Network on Chip (NoC) traffic; the cache coherence and memory controller scheduling policy also affect the latency viewed by streaming multiprocessor in this GPGPU architecture. Also, we discuss the memory space partitioning methods in one following section including coarse grain and fine grain partitioning methods. As for NoC module, we adopted previous research in this work and discuss geometry features of chosen topology – Mesh structure for robust reason. Another contribution of this work is that two machine learning models are used for predicting architecture performance and depicting the performance trend across plenty of hardware configuration settings. We aim to guess a reasonable summit value in performance surface by the following procedures. First, kmeans algorithm clusters training benchmarks into determined number of clusters. The multi-class Support Vector Machine (SVM) model is latter trained to fit memory-related only features. During validation phase, testing benchmarks’ summit performance values are predicted by the result from training phase. Under eight clusters setting, 46.48% predicted cycle performance counts across all tested benchmarks are less than 10% error comparing to real performance values. By varying the number of clusters, up to 57.97% points are less than 10% errors. Also, we show that summit performance not necessary happen under maximum hardware resources. Some discussions point out the memory traffic issues that significantly drag down the execution speed of certain accessing patterns from benchmarks. Combined the mentioned contributions together, we aim to provide a reliable and accurate early stage simulation platform for future IC chip implementation in an efficient way. Chung-Ho Chen 陳中和 2016 學位論文 ; thesis 66 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 電腦與通信工程研究所 === 104 === In this thesis, we present a memory subsystem of customized general purpose GPU architecture. For fast development, the C++ simulated architecture should be kept as light-weight while timing accurate at the same time. Since most parts of benchmark simulation time come from memory subsystem-related latencies. For example, the level one cache miss will trigger Network on Chip (NoC) traffic; the cache coherence and memory controller scheduling policy also affect the latency viewed by streaming multiprocessor in this GPGPU architecture. Also, we discuss the memory space partitioning methods in one following section including coarse grain and fine grain partitioning methods. As for NoC module, we adopted previous research in this work and discuss geometry features of chosen topology – Mesh structure for robust reason. Another contribution of this work is that two machine learning models are used for predicting architecture performance and depicting the performance trend across plenty of hardware configuration settings. We aim to guess a reasonable summit value in performance surface by the following procedures. First, kmeans algorithm clusters training benchmarks into determined number of clusters. The multi-class Support Vector Machine (SVM) model is latter trained to fit memory-related only features. During validation phase, testing benchmarks’ summit performance values are predicted by the result from training phase. Under eight clusters setting, 46.48% predicted cycle performance counts across all tested benchmarks are less than 10% error comparing to real performance values. By varying the number of clusters, up to 57.97% points are less than 10% errors. Also, we show that summit performance not necessary happen under maximum hardware resources. Some discussions point out the memory traffic issues that significantly drag down the execution speed of certain accessing patterns from benchmarks. Combined the mentioned contributions together, we aim to provide a reliable and accurate early stage simulation platform for future IC chip implementation in an efficient way.
author2 Chung-Ho Chen
author_facet Chung-Ho Chen
Kuan-ChiehHsu
許冠傑
author Kuan-ChiehHsu
許冠傑
spellingShingle Kuan-ChiehHsu
許冠傑
Performance Prediction Model on HSA-Compatible General-Purpose GPU System
author_sort Kuan-ChiehHsu
title Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_short Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_full Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_fullStr Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_full_unstemmed Performance Prediction Model on HSA-Compatible General-Purpose GPU System
title_sort performance prediction model on hsa-compatible general-purpose gpu system
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/pja3jx
work_keys_str_mv AT kuanchiehhsu performancepredictionmodelonhsacompatiblegeneralpurposegpusystem
AT xǔguānjié performancepredictionmodelonhsacompatiblegeneralpurposegpusystem
AT kuanchiehhsu hsahuìtúchùlǐqìzhīxiàonéngyùcèmóxíng
AT xǔguānjié hsahuìtúchùlǐqìzhīxiàonéngyùcèmóxíng
_version_ 1719137580010700800