Design of Cycle-accurate SIMT Core and Implementation

碩士 === 國立成功大學 === 電腦與通信工程研究所 === 107 === Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in earl...

Full description

Bibliographic Details
Main Authors:	Jhi-HanJheng, 鄭基漢
Other Authors:	Chung-Ho Chen
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/y96q43

id	ndltd-TW-107NCKU5652001
record_format	oai_dc
spelling	ndltd-TW-107NCKU56520012019-10-25T05:24:17Z http://ndltd.ncl.edu.tw/handle/y96q43 Design of Cycle-accurate SIMT Core and Implementation 時序精確SIMT核心設計與實作 Jhi-HanJheng 鄭基漢碩士國立成功大學電腦與通信工程研究所 107 Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in early development stage possible. Cycle-accurate model, the most detailed functional model in TLM, is used to implement RTLable hardware module by describing behavior of the module at each clock edge. We develop the cycle-accurate SIMT core by basic cycle-accurate modeling approach and evaluate its performance on CASLAB-GPUSim cosimulation platform. The performance comparison between a low-end GPU and an embedded CPU with 1.2GHz shows that the low-end GPU can achieve 4.7 to 20.1 times speedup in good parallelism test cases. When tuning the low-end GPU to 1.2 GHz, it can achieve 52.6 times speedup in the test case GEMM, which is the most time-consuming operation in deep learning applications. Chung-Ho Chen 陳中和 2018 學位論文 ; thesis 62 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 電腦與通信工程研究所 === 107 === Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in early development stage possible. Cycle-accurate model, the most detailed functional model in TLM, is used to implement RTLable hardware module by describing behavior of the module at each clock edge. We develop the cycle-accurate SIMT core by basic cycle-accurate modeling approach and evaluate its performance on CASLAB-GPUSim cosimulation platform. The performance comparison between a low-end GPU and an embedded CPU with 1.2GHz shows that the low-end GPU can achieve 4.7 to 20.1 times speedup in good parallelism test cases. When tuning the low-end GPU to 1.2 GHz, it can achieve 52.6 times speedup in the test case GEMM, which is the most time-consuming operation in deep learning applications.
author2	Chung-Ho Chen
author_facet	Chung-Ho Chen Jhi-HanJheng 鄭基漢
author	Jhi-HanJheng 鄭基漢
spellingShingle	Jhi-HanJheng 鄭基漢 Design of Cycle-accurate SIMT Core and Implementation
author_sort	Jhi-HanJheng
title	Design of Cycle-accurate SIMT Core and Implementation
title_short	Design of Cycle-accurate SIMT Core and Implementation
title_full	Design of Cycle-accurate SIMT Core and Implementation
title_fullStr	Design of Cycle-accurate SIMT Core and Implementation
title_full_unstemmed	Design of Cycle-accurate SIMT Core and Implementation
title_sort	design of cycle-accurate simt core and implementation
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/y96q43
work_keys_str_mv	AT jhihanjheng designofcycleaccuratesimtcoreandimplementation AT zhèngjīhàn designofcycleaccuratesimtcoreandimplementation AT jhihanjheng shíxùjīngquèsimthéxīnshèjìyǔshízuò AT zhèngjīhàn shíxùjīngquèsimthéxīnshèjìyǔshízuò
_version_	1719277969077174272

Design of Cycle-accurate SIMT Core and Implementation

Similar Items