Design of Cycle-accurate SIMT Core and Implementation
碩士 === 國立成功大學 === 電腦與通信工程研究所 === 107 === Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in earl...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/y96q43 |
id |
ndltd-TW-107NCKU5652001 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107NCKU56520012019-10-25T05:24:17Z http://ndltd.ncl.edu.tw/handle/y96q43 Design of Cycle-accurate SIMT Core and Implementation 時序精確SIMT核心設計與實作 Jhi-HanJheng 鄭基漢 碩士 國立成功大學 電腦與通信工程研究所 107 Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in early development stage possible. Cycle-accurate model, the most detailed functional model in TLM, is used to implement RTLable hardware module by describing behavior of the module at each clock edge. We develop the cycle-accurate SIMT core by basic cycle-accurate modeling approach and evaluate its performance on CASLAB-GPUSim cosimulation platform. The performance comparison between a low-end GPU and an embedded CPU with 1.2GHz shows that the low-end GPU can achieve 4.7 to 20.1 times speedup in good parallelism test cases. When tuning the low-end GPU to 1.2 GHz, it can achieve 52.6 times speedup in the test case GEMM, which is the most time-consuming operation in deep learning applications. Chung-Ho Chen 陳中和 2018 學位論文 ; thesis 62 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 電腦與通信工程研究所 === 107 === Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in early development stage possible. Cycle-accurate model, the most detailed functional model in TLM, is used to implement RTLable hardware module by describing behavior of the module at each clock edge. We develop the cycle-accurate SIMT core by basic cycle-accurate modeling approach and evaluate its performance on CASLAB-GPUSim cosimulation platform. The performance comparison between a low-end GPU and an embedded CPU with 1.2GHz shows that the low-end GPU can achieve 4.7 to 20.1 times speedup in good parallelism test cases. When tuning the low-end GPU to 1.2 GHz, it can achieve 52.6 times speedup in the test case GEMM, which is the most time-consuming operation in deep learning applications.
|
author2 |
Chung-Ho Chen |
author_facet |
Chung-Ho Chen Jhi-HanJheng 鄭基漢 |
author |
Jhi-HanJheng 鄭基漢 |
spellingShingle |
Jhi-HanJheng 鄭基漢 Design of Cycle-accurate SIMT Core and Implementation |
author_sort |
Jhi-HanJheng |
title |
Design of Cycle-accurate SIMT Core and Implementation |
title_short |
Design of Cycle-accurate SIMT Core and Implementation |
title_full |
Design of Cycle-accurate SIMT Core and Implementation |
title_fullStr |
Design of Cycle-accurate SIMT Core and Implementation |
title_full_unstemmed |
Design of Cycle-accurate SIMT Core and Implementation |
title_sort |
design of cycle-accurate simt core and implementation |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/y96q43 |
work_keys_str_mv |
AT jhihanjheng designofcycleaccuratesimtcoreandimplementation AT zhèngjīhàn designofcycleaccuratesimtcoreandimplementation AT jhihanjheng shíxùjīngquèsimthéxīnshèjìyǔshízuò AT zhèngjīhàn shíxùjīngquèsimthéxīnshèjìyǔshízuò |
_version_ |
1719277969077174272 |