High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks

碩士 === 逢甲大學 === 電子工程學系 === 106 === Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because...

Full description

Bibliographic Details
Main Authors:	ZENG, JIAN-LIN, 曾建霖
Other Authors:	CHEN, KUAN-HUNG
Format:	Others
Language:	zh-TW
Published:	2018
Online Access:	http://ndltd.ncl.edu.tw/handle/dzn4pq

id	ndltd-TW-106FCU00428007
record_format	oai_dc
spelling	ndltd-TW-106FCU004280072019-05-16T00:08:07Z http://ndltd.ncl.edu.tw/handle/dzn4pq High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks 適用於深度學習類神經網路之卷積層運算高速超大型積體電路設計 ZENG, JIAN-LIN 曾建霖碩士逢甲大學電子工程學系 106 Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because of the Weight will be multiplied with many kinds of Images. Using WS dataflow can reduce the cost of datamovements in hardware architecture. The versions of Design1 are consist of Design1(16b) and Design1(8b). We implement Design1(16b) by refered literature [4]. And we implement Design1(8b) by analyzing the relations between data word length and the classification accuracy in AlexNet model. Design2’s performance is better than Design1. We paralleled the multiplications and additions in Design2. In Design3, the memory requirements are dropped by sharing the memory. Deisgn3 reduces a lot of area costs by sacrificing some performance. Design3 is the final version. The experimental results show that the architecture of Design3 has the best scores while considering computational performance, area costs, and power costs. The peak performance in 200MHz of Design3 is 163.38 GOPS. When it runs the AlexNet model (227x227) in 200MHz, the average performance of this architecture is 32.2 GOPS (48.56fps). The NAND2 Gate-Count result of Design3 is 2.48M when it was synthesised by the TSMC 40nm General Technology. When this circuit runs AlexNet model in 200MHz, the average power is 176.6mW and the GOPS/W is 182.3. The NAND2 Gate-Count result of Design3 is 2.47 M when it was synthesised by only using HVT and RVT Cells. When this circuit runs AlexNet model in 200MHz, the average power is 127.3mW and the GOPS/W is 252.9. CHEN, KUAN-HUNG 陳冠宏 2018 學位論文 ; thesis 96 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 逢甲大學 === 電子工程學系 === 106 === Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because of the Weight will be multiplied with many kinds of Images. Using WS dataflow can reduce the cost of datamovements in hardware architecture. The versions of Design1 are consist of Design1(16b) and Design1(8b). We implement Design1(16b) by refered literature [4]. And we implement Design1(8b) by analyzing the relations between data word length and the classification accuracy in AlexNet model. Design2’s performance is better than Design1. We paralleled the multiplications and additions in Design2. In Design3, the memory requirements are dropped by sharing the memory. Deisgn3 reduces a lot of area costs by sacrificing some performance. Design3 is the final version. The experimental results show that the architecture of Design3 has the best scores while considering computational performance, area costs, and power costs. The peak performance in 200MHz of Design3 is 163.38 GOPS. When it runs the AlexNet model (227x227) in 200MHz, the average performance of this architecture is 32.2 GOPS (48.56fps). The NAND2 Gate-Count result of Design3 is 2.48M when it was synthesised by the TSMC 40nm General Technology. When this circuit runs AlexNet model in 200MHz, the average power is 176.6mW and the GOPS/W is 182.3. The NAND2 Gate-Count result of Design3 is 2.47 M when it was synthesised by only using HVT and RVT Cells. When this circuit runs AlexNet model in 200MHz, the average power is 127.3mW and the GOPS/W is 252.9.
author2	CHEN, KUAN-HUNG
author_facet	CHEN, KUAN-HUNG ZENG, JIAN-LIN 曾建霖
author	ZENG, JIAN-LIN 曾建霖
spellingShingle	ZENG, JIAN-LIN 曾建霖 High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
author_sort	ZENG, JIAN-LIN
title	High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_short	High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_full	High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_fullStr	High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_full_unstemmed	High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_sort	high-performance vlsi design for convolution layer of deep learning neural networks
publishDate	2018
url	http://ndltd.ncl.edu.tw/handle/dzn4pq
work_keys_str_mv	AT zengjianlin highperformancevlsidesignforconvolutionlayerofdeeplearningneuralnetworks AT céngjiànlín highperformancevlsidesignforconvolutionlayerofdeeplearningneuralnetworks AT zengjianlin shìyòngyúshēndùxuéxílèishénjīngwǎnglùzhījuǎnjīcéngyùnsuàngāosùchāodàxíngjītǐdiànlùshèjì AT céngjiànlín shìyòngyúshēndùxuéxílèishénjīngwǎnglùzhījuǎnjīcéngyùnsuàngāosùchāodàxíngjītǐdiànlùshèjì
_version_	1719160011362402304

High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks

Similar Items