Assisted Design Optimization using High-Level Synthesis Flow

碩士 === 國立清華大學 === 資訊工程學系 === 105 === Current research works in accelerator designs mainly relies on register-transfer level (RTL)- based flows to obtain accurate timing, power, and area estimations. Pre-RTL synthesis tool such as Aladdin [1] can also be used to obtain approximately accurate estimati...

Full description

Bibliographic Details
Main Authors: Wei-Chun Chang, 張瑋君
Other Authors: Huang, Chih Tsun
Format: Others
Language:en_US
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/tjtxd3
Description
Summary:碩士 === 國立清華大學 === 資訊工程學系 === 105 === Current research works in accelerator designs mainly relies on register-transfer level (RTL)- based flows to obtain accurate timing, power, and area estimations. Pre-RTL synthesis tool such as Aladdin [1] can also be used to obtain approximately accurate estimations without generating RTL code. However, design exploration of large or complex designs has become a time-consuming process even using RTL or pre-RTL tools. In this thesis, we proposed a design assisted flow which can efficiently reduce the searching points of design exploration when using pre-synthesis tool considering micro-architecture factors, such as loop unrolling, and memory partition. First, we use Aladdin [1] to quickly explore the unrolling factor without considering memory partition and generate dynamic data dependence graphs (DDDG). After choosing a unrolling number, the DDDG is analyzed to explore the memory partition. However, conventional methods for memory partition are mainly block, cyclic, or block-cyclic. The memory partition affects the performance a lot, and it may be the bottleneck for the performance. In our flow, we proposed a memory-remapping methodology to improve the source code with the better data placement in memory partitions based on the DDDG. In the end, we use high-level synthesis tool to generate RTL code to obtain accelerator designs with performance, area, and power. Existing high-level synthesis (HLS) tools, such as Vivado HLS, can generate different architectures of the application by applying different user’s configurartions in C/C++/SystemC. However, the coding style is quite limited. Therefore, we provide three patch methods, which address the improvement of memory partition, loop unrolling, and input buffer of the high-level hardware description, respectively. Experiment results show that we can dramatically reduce the simulation time. Our memory-remapping methodology can also improve the performance of the design with an optimized number of BRAM.