VLSI Design and Implementation of Quaternary Deep Residual Network for Deep Learning

碩士 === 國立高雄科技大學 === 電子工程系 === 107 === In recent years, Deep Neural Networks (DNN) have had ground breaking results in several application domains, ranging from computer vision to speech recognition. In computer vision, a new and particular type of DNN, known as Deep Residual Network (DRN), have demo...

Full description

Bibliographic Details
Main Authors: HSIEH, YEN-CHANG, 謝炎璋
Other Authors: LIEN, CHIH-YUAN
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/jf3crn
Description
Summary:碩士 === 國立高雄科技大學 === 電子工程系 === 107 === In recent years, Deep Neural Networks (DNN) have had ground breaking results in several application domains, ranging from computer vision to speech recognition. In computer vision, a new and particular type of DNN, known as Deep Residual Network (DRN), have demonstrated reliable improvements in object recognition and detection. However, it requires high computational costs, e.g. large memory usage, power consumption, and computation time, to catch high performance. In many real world applications, the object recognition and detection process is included in end-user equipment, such as cell phones and embedded electronics, so a lower-complexity DRN technique, which is suitable for VLSI implementation, is needed. We proposed a VLSI architecture of quaternary DRN and implements it with Verilog HDL. In addition to maintaining accuracy, it effectively accelerates the processing performance of network layer operations and makes the circuit suitable for porting to embedded platforms through low-cost and low-power design. The proposed circuit also can be applied to depths of different DRN model by adjusting sub-circuit execution order. According to the circuit synthesis results of SYNOPSYS's Design Compiler and Artisan TSMC 0.13 μm standard component library, the proposed VLSI requires 165k logic gates. The working clock can reach 50MHz and the estimated power consumption is 18.5299 mW. The experimental results indicate that the computation time of our circuit is better than that implemented with CPU and it was reduced by more than 93%.