Reconfigurable Low Arithmetic Precision Convolution Neural Network Accelerator VLSI Design and Implementation

碩士 === 國立臺灣大學 === 電子工程學研究所 === 107 === Deep neural networks (DNNs) shows promising results on various AI application tasks. However such networks typically are executed on general purpose GPUs with bulky size in form factor and hundreds of watt in power consumption, which unsuitable for mobile appli...

Full description

Bibliographic Details
Main Authors:	En-Ho Shen, 沈恩禾
Other Authors:	Shao-Yi Chien
Format:	Others
Language:	zh-TW
Published:	2019
Online Access:	http://ndltd.ncl.edu.tw/handle/7678c2

Description
Summary:	碩士 === 國立臺灣大學 === 電子工程學研究所 === 107 === Deep neural networks (DNNs) shows promising results on various AI application tasks. However such networks typically are executed on general purpose GPUs with bulky size in form factor and hundreds of watt in power consumption, which unsuitable for mobile applications. In this thesis, we present a VLSI architecture able to process on quantized low numeric-precision convolution neural networks (CNNs), cutting down on power consumption from memory access and speeding the model up with limited area budget,particularlyﬁtformobiledevices.We ﬁrst propose a quantization re-trainig algorithm for trainig low-precision CNN, then a dataﬂow with high data reuse rate with a specially data multiplication accumulation strategy specially designed for such quantized model. To fully utilize the efficiency of computation with such low-precision data, we design a micro-architecture for low bit-length multiplication and accumulation, then a on-chip memory hierarchy and data re-alignment ﬂow for power saving and avoiding buffer bank-conﬂicts, and a PE array designed for taking broadcast-ed data from buffer and sending out ﬁnished data sequentially back to buffer for such dataﬂow. The architecture is highly ﬂexible for various CNN shaped and re-conﬁgurable for low bit-length quantized models. The design synthesised with a 180KB on-chip memory capacity and a 1340k logic gate counts area, the implementation resultshows state-of-the-art hardware efficiency.

Reconfigurable Low Arithmetic Precision Convolution Neural Network Accelerator VLSI Design and Implementation

Similar Items