Summary: | 碩士 === 國立清華大學 === 資訊工程學系 === 105 === Current research works in accelerator designs mainly relies on register-transfer level (RTL)-
based flows to obtain accurate timing, power, and area estimations. Pre-RTL synthesis tool
such as Aladdin [1] can also be used to obtain approximately accurate estimations without
generating RTL code. However, design exploration of large or complex designs has become
a time-consuming process even using RTL or pre-RTL tools.
In this thesis, we proposed a design assisted flow which can efficiently reduce the searching
points of design exploration when using pre-synthesis tool considering micro-architecture
factors, such as loop unrolling, and memory partition. First, we use Aladdin [1] to quickly
explore the unrolling factor without considering memory partition and generate dynamic
data dependence graphs (DDDG). After choosing a unrolling number, the DDDG is analyzed
to explore the memory partition. However, conventional methods for memory partition are
mainly block, cyclic, or block-cyclic. The memory partition affects the performance a lot, and
it may be the bottleneck for the performance. In our flow, we proposed a memory-remapping
methodology to improve the source code with the better data placement in memory partitions
based on the DDDG. In the end, we use high-level synthesis tool to generate RTL code to
obtain accelerator designs with performance, area, and power.
Existing high-level synthesis (HLS) tools, such as Vivado HLS, can generate different
architectures of the application by applying different user’s configurartions in
C/C++/SystemC. However, the coding style is quite limited. Therefore, we provide three
patch methods, which address the improvement of memory partition, loop unrolling, and
input buffer of the high-level hardware description, respectively.
Experiment results show that we can dramatically reduce the simulation time. Our
memory-remapping methodology can also improve the performance of the design with an
optimized number of BRAM.
|