Altera Based HLS Implementation on Loop Intensive Algorithm Performance and Analysis

碩士 === 輔仁大學 === 資訊工程學系碩士班 === 107 === This thesis explores the usage of High Level Synthesis (HLS) on loop intensive algorithms on Altera and its performance analysis. Besides, usage of this tool on FPGA for these algorithms can improve the performance speed. Most of the HLS research is based on Xil...

Full description

Bibliographic Details
Main Authors: LAI,YU-XUAN, 賴昱璇
Other Authors: Joseph Arul
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/3juqu4
Description
Summary:碩士 === 輔仁大學 === 資訊工程學系碩士班 === 107 === This thesis explores the usage of High Level Synthesis (HLS) on loop intensive algorithms on Altera and its performance analysis. Besides, usage of this tool on FPGA for these algorithms can improve the performance speed. Most of the HLS research is based on Xilinx. This study primarily focuses on implementation of HLS on Altera based FPGA using HLS tools and its compiler focusing in loop intensive programs and unrolling of loops on those algorithms and its performance. For the experiment purpose we have used two algorithms provided and tested for Xilinx in Altera with various data size and applied unrolling of loops on those loop intensive algorithms. The algorithms used for the experiments are, such as Fast Fourier Transform, Matrix multiplication and addition and some sorting programs which are commonly used in various image processing and other applications. These algorithms not only improve performance at times, but reduces the memory usage while running these algorithms. When the benchmarks provided by Xilinx run on Altera DE2-115 with various data size such as 20 to 1280, the improvement on the execution time shows about 8.9 to 1.06. When Matrix Multiplication and addition algorithms were used with two loops, for different data size, the improvement is about 1.53 to 1.78 times. Whereas the benchmarks provided by Xilinx, not only presents performance improvement, but also shows reduction in the usage of memory for the data size 160x160 from 1.13 to 1.48 times. However, when the sorting algorithms with unrolling of loop is executed on HLS, the improvement is not that significant. The improvement is very less such as 1.00233 and 1.00097 only. Besides, the FFT algorithm does not show much improvement even with unrolling of the loop technique. When HLS is used, the improvement in memory usage reduces significantly. However, DE2-115 board memory is significantly less. If the HLS compiler is used with high memory board such as Altera Arria 10 one can notice the significant improvement. Besides, for the large amount of data, FPGA board with large memory can be used to analyze the algorithms and its impact. It can be noted that when we reduce the usage of memory, definitely improves also the execution time of the benchmarks.