Summary: | 碩士 === 逢甲大學 === 電子工程學系 === 106 === Convolutional Neural Networks (CNN) are widely used in modern AI system. This thesis proposed three hardware architectures for the CNN. We named them Design1~Design3. The 1-D Processing Element (PE) is structured by using Weight Stationary (WS) in Design1. Because of the Weight will be multiplied with many kinds of Images. Using WS dataflow can reduce the cost of datamovements in hardware architecture. The versions of Design1 are consist of Design1(16b) and Design1(8b). We implement Design1(16b) by refered literature [4]. And we implement Design1(8b) by analyzing the relations between data word length and the classification accuracy in AlexNet model. Design2’s performance is better than Design1. We paralleled the multiplications and additions in Design2. In Design3, the memory requirements are dropped by sharing the memory. Deisgn3 reduces a lot of area costs by sacrificing some performance.
Design3 is the final version. The experimental results show that the architecture of Design3 has the best scores while considering computational performance, area costs, and power costs. The peak performance in 200MHz of Design3 is 163.38 GOPS. When it runs the AlexNet model (227x227) in 200MHz, the average performance of this architecture is 32.2 GOPS (48.56fps). The NAND2 Gate-Count result of Design3 is 2.48M when it was synthesised by the TSMC 40nm General Technology. When this circuit runs AlexNet model in 200MHz, the average power is 176.6mW and the GOPS/W is 182.3. The NAND2 Gate-Count result of Design3 is 2.47 M when it was synthesised by only using HVT and RVT Cells. When this circuit runs AlexNet model in 200MHz, the average power is 127.3mW and the GOPS/W is 252.9.
|