High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks
碩士 === 逢甲大學 === 電子工程學系 === 106 === Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2018
|
Online Access: | http://ndltd.ncl.edu.tw/handle/dzn4pq |
id |
ndltd-TW-106FCU00428007 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-106FCU004280072019-05-16T00:08:07Z http://ndltd.ncl.edu.tw/handle/dzn4pq High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks 適用於深度學習類神經網路之卷積層運算高速超大型積體電路設計 ZENG, JIAN-LIN 曾建霖 碩士 逢甲大學 電子工程學系 106 Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because of the Weight will be multiplied with many kinds of Images. Using WS dataflow can reduce the cost of datamovements in hardware architecture. The versions of Design1 are consist of Design1(16b) and Design1(8b). We implement Design1(16b) by refered literature [4]. And we implement Design1(8b) by analyzing the relations between data word length and the classification accuracy in AlexNet model. Design2’s performance is better than Design1. We paralleled the multiplications and additions in Design2. In Design3, the memory requirements are dropped by sharing the memory. Deisgn3 reduces a lot of area costs by sacrificing some performance. Design3 is the final version. The experimental results show that the architecture of Design3 has the best scores while considering computational performance, area costs, and power costs. The peak performance in 200MHz of Design3 is 163.38 GOPS. When it runs the AlexNet model (227x227) in 200MHz, the average performance of this architecture is 32.2 GOPS (48.56fps). The NAND2 Gate-Count result of Design3 is 2.48M when it was synthesised by the TSMC 40nm General Technology. When this circuit runs AlexNet model in 200MHz, the average power is 176.6mW and the GOPS/W is 182.3. The NAND2 Gate-Count result of Design3 is 2.47 M when it was synthesised by only using HVT and RVT Cells. When this circuit runs AlexNet model in 200MHz, the average power is 127.3mW and the GOPS/W is 252.9. CHEN, KUAN-HUNG 陳冠宏 2018 學位論文 ; thesis 96 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 逢甲大學 === 電子工程學系 === 106 === Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because of the Weight will be multiplied with many kinds of Images. Using WS dataflow can reduce the cost of datamovements in hardware architecture. The versions of Design1 are consist of Design1(16b) and Design1(8b). We implement Design1(16b) by refered literature [4]. And we implement Design1(8b) by analyzing the relations between data word length and the classification accuracy in AlexNet model. Design2’s performance is better than Design1. We paralleled the multiplications and additions in Design2. In Design3, the memory requirements are dropped by sharing the memory. Deisgn3 reduces a lot of area costs by sacrificing some performance.
Design3 is the final version. The experimental results show that the architecture of Design3 has the best scores while considering computational performance, area costs, and power costs. The peak performance in 200MHz of Design3 is 163.38 GOPS. When it runs the AlexNet model (227x227) in 200MHz, the average performance of this architecture is 32.2 GOPS (48.56fps). The NAND2 Gate-Count result of Design3 is 2.48M when it was synthesised by the TSMC 40nm General Technology. When this circuit runs AlexNet model in 200MHz, the average power is 176.6mW and the GOPS/W is 182.3. The NAND2 Gate-Count result of Design3 is 2.47 M when it was synthesised by only using HVT and RVT Cells. When this circuit runs AlexNet model in 200MHz, the average power is 127.3mW and the GOPS/W is 252.9.
|
author2 |
CHEN, KUAN-HUNG |
author_facet |
CHEN, KUAN-HUNG ZENG, JIAN-LIN 曾建霖 |
author |
ZENG, JIAN-LIN 曾建霖 |
spellingShingle |
ZENG, JIAN-LIN 曾建霖 High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks |
author_sort |
ZENG, JIAN-LIN |
title |
High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks |
title_short |
High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks |
title_full |
High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks |
title_fullStr |
High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks |
title_full_unstemmed |
High-Performance VLSI Design for Convolution Layer of Deep Learning Neural Networks |
title_sort |
high-performance vlsi design for convolution layer of deep learning neural networks |
publishDate |
2018 |
url |
http://ndltd.ncl.edu.tw/handle/dzn4pq |
work_keys_str_mv |
AT zengjianlin highperformancevlsidesignforconvolutionlayerofdeeplearningneuralnetworks AT céngjiànlín highperformancevlsidesignforconvolutionlayerofdeeplearningneuralnetworks AT zengjianlin shìyòngyúshēndùxuéxílèishénjīngwǎnglùzhījuǎnjīcéngyùnsuàngāosùchāodàxíngjītǐdiànlùshèjì AT céngjiànlín shìyòngyúshēndùxuéxílèishénjīngwǎnglùzhījuǎnjīcéngyùnsuàngāosùchāodàxíngjītǐdiànlùshèjì |
_version_ |
1719160011362402304 |