Design of Cycle-accurate SIMT Core and Implementation

碩士 === 國立成功大學 === 電腦與通信工程研究所 === 107 === Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in earl...

Full description

Bibliographic Details
Main Authors: Jhi-HanJheng, 鄭基漢
Other Authors: Chung-Ho Chen
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/y96q43
id ndltd-TW-107NCKU5652001
record_format oai_dc
spelling ndltd-TW-107NCKU56520012019-10-25T05:24:17Z http://ndltd.ncl.edu.tw/handle/y96q43 Design of Cycle-accurate SIMT Core and Implementation 時序精確SIMT核心設計與實作 Jhi-HanJheng 鄭基漢 碩士 國立成功大學 電腦與通信工程研究所 107 Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in early development stage possible. Cycle-accurate model, the most detailed functional model in TLM, is used to implement RTLable hardware module by describing behavior of the module at each clock edge. We develop the cycle-accurate SIMT core by basic cycle-accurate modeling approach and evaluate its performance on CASLAB-GPUSim cosimulation platform. The performance comparison between a low-end GPU and an embedded CPU with 1.2GHz shows that the low-end GPU can achieve 4.7 to 20.1 times speedup in good parallelism test cases. When tuning the low-end GPU to 1.2 GHz, it can achieve 52.6 times speedup in the test case GEMM, which is the most time-consuming operation in deep learning applications. Chung-Ho Chen 陳中和 2018 學位論文 ; thesis 62 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 電腦與通信工程研究所 === 107 === Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in early development stage possible. Cycle-accurate model, the most detailed functional model in TLM, is used to implement RTLable hardware module by describing behavior of the module at each clock edge. We develop the cycle-accurate SIMT core by basic cycle-accurate modeling approach and evaluate its performance on CASLAB-GPUSim cosimulation platform. The performance comparison between a low-end GPU and an embedded CPU with 1.2GHz shows that the low-end GPU can achieve 4.7 to 20.1 times speedup in good parallelism test cases. When tuning the low-end GPU to 1.2 GHz, it can achieve 52.6 times speedup in the test case GEMM, which is the most time-consuming operation in deep learning applications.
author2 Chung-Ho Chen
author_facet Chung-Ho Chen
Jhi-HanJheng
鄭基漢
author Jhi-HanJheng
鄭基漢
spellingShingle Jhi-HanJheng
鄭基漢
Design of Cycle-accurate SIMT Core and Implementation
author_sort Jhi-HanJheng
title Design of Cycle-accurate SIMT Core and Implementation
title_short Design of Cycle-accurate SIMT Core and Implementation
title_full Design of Cycle-accurate SIMT Core and Implementation
title_fullStr Design of Cycle-accurate SIMT Core and Implementation
title_full_unstemmed Design of Cycle-accurate SIMT Core and Implementation
title_sort design of cycle-accurate simt core and implementation
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/y96q43
work_keys_str_mv AT jhihanjheng designofcycleaccuratesimtcoreandimplementation
AT zhèngjīhàn designofcycleaccuratesimtcoreandimplementation
AT jhihanjheng shíxùjīngquèsimthéxīnshèjìyǔshízuò
AT zhèngjīhàn shíxùjīngquèsimthéxīnshèjìyǔshízuò
_version_ 1719277969077174272