Summary: | 碩士 === 國立成功大學 === 電腦與通信工程研究所 === 107 === Developing a GPU computing platform requires both software and hardware development. To overcome the complex development process, adopting TLM methodology can build the system by incremental development process, which makes verification and validation in early development stage possible. Cycle-accurate model, the most detailed functional model in TLM, is used to implement RTLable hardware module by describing behavior of the module at each clock edge. We develop the cycle-accurate SIMT core by basic cycle-accurate modeling approach and evaluate its performance on CASLAB-GPUSim cosimulation platform. The performance comparison between a low-end GPU and an embedded CPU with 1.2GHz shows that the low-end GPU can achieve 4.7 to 20.1 times speedup in good parallelism test cases. When tuning the low-end GPU to 1.2 GHz, it can achieve 52.6 times speedup in the test case GEMM, which is the most time-consuming operation in deep learning applications.
|