IKW: Inter-Kernel Weights for Power Efficient Edge Computing

Deep Convolutional Neural Networks (CNN) have achieved state-of-the-art recognition accuracy in a wide range of computer vision applications like image classification, object detection, semantic segmentation etc. Applications based on CNN require millions of multiply-accumulate (MAC) operations to b...

Full description

Bibliographic Details
Main Authors:	Pramod Udupa, Gopinath Mahale, Kiran Kolar Chandrasekharan, Sehwan Lee
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Inter-kernel weights quantization multiply-accumulate unit split accumulator kernel zero skipping convolutional neural network
Online Access:	https://ieeexplore.ieee.org/document/9090142/

id	doaj-b5f67c6010684490bea723df0a33345c
record_format	Article
spelling	doaj-b5f67c6010684490bea723df0a33345c2021-03-30T01:54:35ZengIEEEIEEE Access2169-35362020-01-018904509046410.1109/ACCESS.2020.29935069090142IKW: Inter-Kernel Weights for Power Efficient Edge ComputingPramod Udupa0https://orcid.org/0000-0003-4943-3643Gopinath Mahale1https://orcid.org/0000-0002-6950-0834Kiran Kolar Chandrasekharan2https://orcid.org/0000-0002-0741-6696Sehwan Lee3Samsung Advanced Institute of Technology (SAIT), Samsung Research and Development Institute India-Bangalore Pvt., Ltd., Bengaluru, IndiaSamsung Advanced Institute of Technology (SAIT), Samsung Research and Development Institute India-Bangalore Pvt., Ltd., Bengaluru, IndiaSamsung Advanced Institute of Technology (SAIT), Samsung Research and Development Institute India-Bangalore Pvt., Ltd., Bengaluru, IndiaSamsung Advanced Institute of Technology (SAIT), Samsung Electronics Company, Ltd., Suwon, South KoreaDeep Convolutional Neural Networks (CNN) have achieved state-of-the-art recognition accuracy in a wide range of computer vision applications like image classification, object detection, semantic segmentation etc. Applications based on CNN require millions of multiply-accumulate (MAC) operations to be performed between input pixels and kernel weights during inference. This work investigates a technique, which can be used to eliminate redundant multiplications for a subset of kernel weights in a CNN layer by utilizing identical and/or similar inter-kernel weights (IKW) across kernels. In this work, IKW technique is used to identify identical and/or similar inter-kernel weights in trained, unpruned/pruned, quantized CNN kernels before inference phase. After identification of identical and/or similar inter-kernel weights, a subset of kernel weights termed non-pivot kernel weights are made zero, the other subset called pivot kernel weights are left unchanged. The multiplication corresponding to non-pivot kernel weights are eliminated, thus reducing computations. The products corresponding to non-pivot kernel weights are supplied by multiplication operation of pivot kernel weights, and hence causing no degradation in inference accuracy. Through experiments on state-of-the-art CNNs, we demonstrate that application of IKW technique enhances kernel sparsity by 9-37% for 8-bit precision kernel weight and 18-43% for 4-bit precision kernel weight without degrading the recognition accuracy of the CNN model. Enhanced kernel sparsity can be used to save power by clock gating the compute unit, or increase execution performance by skipping computations pertaining to zero valued non-pivot kernel weights. In addition, power savings are achieved by eliminating redundant power expensive fixed-point multiplication operations. The practical utility of the IKW technique is demonstrated by mapping it to well-known state-of-the-art CNN accelerator architectures. Mapping of the IKW technique on existing CNN accelerator architectures shows reduction in power by at least 12% for 8-bit precision and 19% for 4-bit precision kernel weight. Improvement in execution performance by at least 2% for 8-bit precision and 13% for 4-bit precision kernel weight is observed.https://ieeexplore.ieee.org/document/9090142/Inter-kernel weightsquantizationmultiply-accumulate unitsplit accumulatorkernel zero skippingconvolutional neural network
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Pramod Udupa Gopinath Mahale Kiran Kolar Chandrasekharan Sehwan Lee
spellingShingle	Pramod Udupa Gopinath Mahale Kiran Kolar Chandrasekharan Sehwan Lee IKW: Inter-Kernel Weights for Power Efficient Edge Computing IEEE Access Inter-kernel weights quantization multiply-accumulate unit split accumulator kernel zero skipping convolutional neural network
author_facet	Pramod Udupa Gopinath Mahale Kiran Kolar Chandrasekharan Sehwan Lee
author_sort	Pramod Udupa
title	IKW: Inter-Kernel Weights for Power Efficient Edge Computing
title_short	IKW: Inter-Kernel Weights for Power Efficient Edge Computing
title_full	IKW: Inter-Kernel Weights for Power Efficient Edge Computing
title_fullStr	IKW: Inter-Kernel Weights for Power Efficient Edge Computing
title_full_unstemmed	IKW: Inter-Kernel Weights for Power Efficient Edge Computing
title_sort	ikw: inter-kernel weights for power efficient edge computing
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Deep Convolutional Neural Networks (CNN) have achieved state-of-the-art recognition accuracy in a wide range of computer vision applications like image classification, object detection, semantic segmentation etc. Applications based on CNN require millions of multiply-accumulate (MAC) operations to be performed between input pixels and kernel weights during inference. This work investigates a technique, which can be used to eliminate redundant multiplications for a subset of kernel weights in a CNN layer by utilizing identical and/or similar inter-kernel weights (IKW) across kernels. In this work, IKW technique is used to identify identical and/or similar inter-kernel weights in trained, unpruned/pruned, quantized CNN kernels before inference phase. After identification of identical and/or similar inter-kernel weights, a subset of kernel weights termed non-pivot kernel weights are made zero, the other subset called pivot kernel weights are left unchanged. The multiplication corresponding to non-pivot kernel weights are eliminated, thus reducing computations. The products corresponding to non-pivot kernel weights are supplied by multiplication operation of pivot kernel weights, and hence causing no degradation in inference accuracy. Through experiments on state-of-the-art CNNs, we demonstrate that application of IKW technique enhances kernel sparsity by 9-37% for 8-bit precision kernel weight and 18-43% for 4-bit precision kernel weight without degrading the recognition accuracy of the CNN model. Enhanced kernel sparsity can be used to save power by clock gating the compute unit, or increase execution performance by skipping computations pertaining to zero valued non-pivot kernel weights. In addition, power savings are achieved by eliminating redundant power expensive fixed-point multiplication operations. The practical utility of the IKW technique is demonstrated by mapping it to well-known state-of-the-art CNN accelerator architectures. Mapping of the IKW technique on existing CNN accelerator architectures shows reduction in power by at least 12% for 8-bit precision and 19% for 4-bit precision kernel weight. Improvement in execution performance by at least 2% for 8-bit precision and 13% for 4-bit precision kernel weight is observed.
topic	Inter-kernel weights quantization multiply-accumulate unit split accumulator kernel zero skipping convolutional neural network
url	https://ieeexplore.ieee.org/document/9090142/
work_keys_str_mv	AT pramodudupa ikwinterkernelweightsforpowerefficientedgecomputing AT gopinathmahale ikwinterkernelweightsforpowerefficientedgecomputing AT kirankolarchandrasekharan ikwinterkernelweightsforpowerefficientedgecomputing AT sehwanlee ikwinterkernelweightsforpowerefficientedgecomputing
_version_	1724186248986230784

IKW: Inter-Kernel Weights for Power Efficient Edge Computing

Similar Items