High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks

碩士 === 逢甲大學 === 電子工程學系 === 106 === Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because...

Full description

Bibliographic Details
Main Authors: ZENG, JIAN-LIN, 曾建霖
Other Authors: CHEN, KUAN-HUNG
Format: Others
Language:zh-TW
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/dzn4pq
id ndltd-TW-106FCU00428007
record_format oai_dc
spelling ndltd-TW-106FCU004280072019-05-16T00:08:07Z http://ndltd.ncl.edu.tw/handle/dzn4pq High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks 適用於深度學習類神經網路之卷積層運算高速超大型積體電路設計 ZENG, JIAN-LIN 曾建霖 碩士 逢甲大學 電子工程學系 106 Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because of the Weight will be multiplied with many kinds of Images. Using WS dataflow can reduce the cost of datamovements in hardware architecture. The versions of Design1 are consist of Design1(16b) and Design1(8b). We implement Design1(16b) by refered literature [4]. And we implement Design1(8b) by analyzing the relations between data word length and the classification accuracy in AlexNet model. Design2’s performance is better than Design1. We paralleled the multiplications and additions in Design2. In Design3, the memory requirements are dropped by sharing the memory. Deisgn3 reduces a lot of area costs by sacrificing some performance. Design3 is the final version. The experimental results show that the architecture of Design3 has the best scores while considering computational performance, area costs, and power costs. The peak performance in 200MHz of Design3 is 163.38 GOPS. When it runs the AlexNet model (227x227) in 200MHz, the average performance of this architecture is 32.2 GOPS (48.56fps). The NAND2 Gate-Count result of Design3 is 2.48M when it was synthesised by the TSMC 40nm General Technology. When this circuit runs AlexNet model in 200MHz, the average power is 176.6mW and the GOPS/W is 182.3. The NAND2 Gate-Count result of Design3 is 2.47 M when it was synthesised by only using HVT and RVT Cells. When this circuit runs AlexNet model in 200MHz, the average power is 127.3mW and the GOPS/W is 252.9. CHEN, KUAN-HUNG 陳冠宏 2018 學位論文 ; thesis 96 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 逢甲大學 === 電子工程學系 === 106 === Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because of the Weight will be multiplied with many kinds of Images. Using WS dataflow can reduce the cost of datamovements in hardware architecture. The versions of Design1 are consist of Design1(16b) and Design1(8b). We implement Design1(16b) by refered literature [4]. And we implement Design1(8b) by analyzing the relations between data word length and the classification accuracy in AlexNet model. Design2’s performance is better than Design1. We paralleled the multiplications and additions in Design2. In Design3, the memory requirements are dropped by sharing the memory. Deisgn3 reduces a lot of area costs by sacrificing some performance. Design3 is the final version. The experimental results show that the architecture of Design3 has the best scores while considering computational performance, area costs, and power costs. The peak performance in 200MHz of Design3 is 163.38 GOPS. When it runs the AlexNet model (227x227) in 200MHz, the average performance of this architecture is 32.2 GOPS (48.56fps). The NAND2 Gate-Count result of Design3 is 2.48M when it was synthesised by the TSMC 40nm General Technology. When this circuit runs AlexNet model in 200MHz, the average power is 176.6mW and the GOPS/W is 182.3. The NAND2 Gate-Count result of Design3 is 2.47 M when it was synthesised by only using HVT and RVT Cells. When this circuit runs AlexNet model in 200MHz, the average power is 127.3mW and the GOPS/W is 252.9.
author2 CHEN, KUAN-HUNG
author_facet CHEN, KUAN-HUNG
ZENG, JIAN-LIN
曾建霖
author ZENG, JIAN-LIN
曾建霖
spellingShingle ZENG, JIAN-LIN
曾建霖
High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
author_sort ZENG, JIAN-LIN
title High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_short High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_full High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_fullStr High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_full_unstemmed High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
title_sort high-performance vlsi design for convolution layer of deep learning neural networks
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/dzn4pq
work_keys_str_mv AT zengjianlin highperformancevlsidesignforconvolutionlayerofdeeplearningneuralnetworks
AT céngjiànlín highperformancevlsidesignforconvolutionlayerofdeeplearningneuralnetworks
AT zengjianlin shìyòngyúshēndùxuéxílèishénjīngwǎnglùzhījuǎnjīcéngyùnsuàngāosùchāodàxíngjītǐdiànlùshèjì
AT céngjiànlín shìyòngyúshēndùxuéxílèishénjīngwǎnglùzhījuǎnjīcéngyùnsuàngāosùchāodàxíngjītǐdiànlùshèjì
_version_ 1719160011362402304