An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training

碩士 === 國立臺灣大學 === 電子工程學研究所 === 107 === The recent resurgence of artificial intelligence is due to advances in deep learning. Deep neural network (DNN) has exceeded human capability in many computer vision applications, such as object detection, image classification and playing games like Go. The ide...

Full description

Bibliographic Details
Main Authors:	Kung, Chu King, 江子近
Other Authors:	Tzi-Dar Chiueh
Format:	Others
Language:	en_US
Published:	2019
Online Access:	http://ndltd.ncl.edu.tw/handle/y475rn

id	ndltd-TW-107NTU05428058
record_format	oai_dc
spelling	ndltd-TW-107NTU054280582019-11-16T05:27:58Z http://ndltd.ncl.edu.tw/handle/y475rn An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training 高效能卷積神經網路訓練系統加速晶片 Kung, Chu King 江子近碩士國立臺灣大學電子工程學研究所 107 The recent resurgence of artificial intelligence is due to advances in deep learning. Deep neural network (DNN) has exceeded human capability in many computer vision applications, such as object detection, image classification and playing games like Go. The idea of deep learning dates back to as early as the 1950s, with the key algorithmic breakthroughs occurred in the 1980s. Yet, it has only been in the past few years, that powerful hardware accelerators became available to train neural networks. Even now, the demand for machine learning algorithms is still increasing; and it is affecting almost every industry. Therefore, designing a powerful and efficient hardware accelerator for deep learning algorithms is of critical importance for the time being. The accelerators that run the deep learning algorithm must be general enough to support deep neural networks with various computational structures. For instance, general-purpose graphics processing units (GP-GPUs) were widely adopted for deep learning tasks ever since they allow users to execute arbitrary code on them. Other than graphics processing units, researchers have also paid a lot of attention to hardware acceleration of deep neural networks (DNNs) in the last few years. Google developed its own chip called the Tensor Processing Unit (TPU) to power its own machine learning services [8]; while Intel unveiled its first generation of ASIC processor, called Nervana, for deep learning a few years ago [9]. ASICs usually provide a better performance, compared with FPGA and software implementations. Nevertheless, existing accelerators mostly focus on inference. However, local DNN training is still required to meet the needs of new applications, such as incremental learning and on-device personalization. Unlike inference, training requires high dynamic range in order to deliver high learning quality. In this work, we introduce the floating-point signed digit (FloatSD) data representation format for reducing computational complexity required for both the inference and the training of a convolutional neural network (CNN). By co-designing data representation and circuit, we demonstrate that we can achieve high raw performance and optimal efficiency – both energy and area – without sacrificing the quality of training. This work focuses on the design of FloatSD based system on chip (SOC) for AI training and inference. The SOC consists of an AI IP, integrated DDR3 controller and ARC HS34 CPU through AXI/AHB standard AMBA interfaces. The platform can be programmed by the CPU via the AHB slave port to fit various neural network topologies. The completed SOC has been tested and validated on the HAPS-80 FPGA platform. A synthesis and automated place and route (APR) flow is used to tape out a 28 nm test chip, after testing and verifying the correctness of the SOC. At its normal operating condition (e.g. 400MHz), the accelerator is capable of 1.38 TFLOPs peak performance and 2.34 TFLOPS/W. Tzi-Dar Chiueh 闕志達 2019 學位論文 ; thesis 128 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 電子工程學研究所 === 107 === The recent resurgence of artificial intelligence is due to advances in deep learning. Deep neural network (DNN) has exceeded human capability in many computer vision applications, such as object detection, image classification and playing games like Go. The idea of deep learning dates back to as early as the 1950s, with the key algorithmic breakthroughs occurred in the 1980s. Yet, it has only been in the past few years, that powerful hardware accelerators became available to train neural networks. Even now, the demand for machine learning algorithms is still increasing; and it is affecting almost every industry. Therefore, designing a powerful and efficient hardware accelerator for deep learning algorithms is of critical importance for the time being. The accelerators that run the deep learning algorithm must be general enough to support deep neural networks with various computational structures. For instance, general-purpose graphics processing units (GP-GPUs) were widely adopted for deep learning tasks ever since they allow users to execute arbitrary code on them. Other than graphics processing units, researchers have also paid a lot of attention to hardware acceleration of deep neural networks (DNNs) in the last few years. Google developed its own chip called the Tensor Processing Unit (TPU) to power its own machine learning services [8]; while Intel unveiled its first generation of ASIC processor, called Nervana, for deep learning a few years ago [9]. ASICs usually provide a better performance, compared with FPGA and software implementations. Nevertheless, existing accelerators mostly focus on inference. However, local DNN training is still required to meet the needs of new applications, such as incremental learning and on-device personalization. Unlike inference, training requires high dynamic range in order to deliver high learning quality. In this work, we introduce the floating-point signed digit (FloatSD) data representation format for reducing computational complexity required for both the inference and the training of a convolutional neural network (CNN). By co-designing data representation and circuit, we demonstrate that we can achieve high raw performance and optimal efficiency – both energy and area – without sacrificing the quality of training. This work focuses on the design of FloatSD based system on chip (SOC) for AI training and inference. The SOC consists of an AI IP, integrated DDR3 controller and ARC HS34 CPU through AXI/AHB standard AMBA interfaces. The platform can be programmed by the CPU via the AHB slave port to fit various neural network topologies. The completed SOC has been tested and validated on the HAPS-80 FPGA platform. A synthesis and automated place and route (APR) flow is used to tape out a 28 nm test chip, after testing and verifying the correctness of the SOC. At its normal operating condition (e.g. 400MHz), the accelerator is capable of 1.38 TFLOPs peak performance and 2.34 TFLOPS/W.
author2	Tzi-Dar Chiueh
author_facet	Tzi-Dar Chiueh Kung, Chu King 江子近
author	Kung, Chu King 江子近
spellingShingle	Kung, Chu King 江子近 An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training
author_sort	Kung, Chu King
title	An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training
title_short	An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training
title_full	An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training
title_fullStr	An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training
title_full_unstemmed	An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training
title_sort	energy-efficient accelerator soc for convolutional neural network training
publishDate	2019
url	http://ndltd.ncl.edu.tw/handle/y475rn
work_keys_str_mv	AT kungchuking anenergyefficientacceleratorsocforconvolutionalneuralnetworktraining AT jiāngzijìn anenergyefficientacceleratorsocforconvolutionalneuralnetworktraining AT kungchuking gāoxiàonéngjuǎnjīshénjīngwǎnglùxùnliànxìtǒngjiāsùjīngpiàn AT jiāngzijìn gāoxiàonéngjuǎnjīshénjīngwǎnglùxùnliànxìtǒngjiāsùjīngpiàn AT kungchuking energyefficientacceleratorsocforconvolutionalneuralnetworktraining AT jiāngzijìn energyefficientacceleratorsocforconvolutionalneuralnetworktraining
_version_	1719292359360905216

An Energy-Efficient Accelerator SOC for Convolutional Neural Network Training

Similar Items