Summary: | 碩士 === 國立中山大學 === 資訊工程學系研究所 === 107 === Due to the explosively growing number of applications of neural network (NN) based machine learning models, how to design efficient accelerator circuits for NN has become a very hot topic in recent years. The paper first proposed a VLSI architecture of the convolutional NN (CNN) accelerator based on the use of small-size processing unit (PE) which contains a single multiply-accumulator (MAC). This type of PE can lead to high hardware utilization for wide range of CNN kernel sizes. The other salient feature of the proposed CNN architecture is it also adopts an efficient line buffer design which can also support various filter kernel sizes. Traditional line buffer design consisting of simply shift registers can reduce the access of the next-level memory hierarchy significantly when processing the convolutional operations. However, in order to accommodate the applications of nowadays CNN models which normally contain different filter sizes, this thesis has extended the conventional line buffer design by including some data skipping mechanism. The resulted architecture can generate the required data stream seamlessly for various number of PEs to achieve the maximum hardware utilization. The proposed line-buffer design is thus very suitable for the design of reconfigurable CNN accelerators. The proposed overall CNN circuit has been realized in an SOC (system-on-a-chip) FPGA platform with OS environment. The driver of the circuit has also been implemented, such that this thesis can demonstrate the classification of CIFAR-10 dataset based on our accelerator, and display the classification result on the screen. Finally, based on the proposed CNN accelerator architecture, this thesis proposed a generator which can generate the accelerator according to the given input setting parameters including the number of processing elements (PE) and pooling function being selected. The users can also decide the size of on-chip memory used in the accelerator circuits. The related software environment under the Xilinx Zedboard environment will also be generated.
|