The Efficient VLSI Design and Implementation of Neural Networks Based on Depthwise Separable Convolution

碩士 === 國立臺灣科技大學 === 電子工程系 === 106 === This thesis presents the efficient VLSI architecture design and circuit implementation for a Neural Network based on the depthwise separable convolution. The design proposed in this thesis, to the best of the knowledge, depicts the first hardware accelerator for...

Full description

Bibliographic Details
Main Authors: Hung-Ju Lin, 林泓儒
Other Authors: Chung-An Shen
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/ta436p
Description
Summary:碩士 === 國立臺灣科技大學 === 電子工程系 === 106 === This thesis presents the efficient VLSI architecture design and circuit implementation for a Neural Network based on the depthwise separable convolution. The design proposed in this thesis, to the best of the knowledge, depicts the first hardware accelerator for the inference of the MobileNet, a Neural Network on the basis of the depthwise separable convolution scheme. In particular, in order to achieve high throughput while still maintaining low area complexity, a novel data-processing flow is proposed so that the amount of data accesses with the off-chip DRAM is significantly reduced. Furthermore, the proposed architecture enjoys high degree of data reuse without utilizing excessive amounts of storing buffers. Therefore the area complexity incurred by the storage elements is largely mitigated. Based on the proposed data-processing flow and the data reuse scheme, a highly pipelined architecture is designed aiming at achieving high processing throughput. The implemented circuit is synthesized with TSMC 90nm technology and the evaluations for the performance and area complexity have been conducted based on the post-synthesized estimations. The experimental results show that the proposed architecture achieves a throughput of 33.514 Giga-MACs with the hardware complexity of 6340 KGEs excluding the highly technology dependent memory buffers. Compared to the state-of-the art design, the propose architecture achieves a 5× enhancements in speed and leads to approximately 30% reductions in area complexity.