FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons

With the growing demand for deploying Deep Learning models to the “edge”, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm fo...

Full description

Bibliographic Details
Main Authors:	Simon Wiedemann, Suhas Shivapakash, Daniel Becking, Pablo Wiedemann, Wojciech Samek, Friedel Gerfers, Thomas Wiegand
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Open Journal of Circuits and Systems
Subjects:	Deep learning neural network compression efficient representation efficient processing of DNNs DNN accelerator
Online Access:	https://ieeexplore.ieee.org/document/9440253/

id	doaj-bbed58d04eec4fa585b028f1f2015059
record_format	Article
spelling	doaj-bbed58d04eec4fa585b028f1f20150592021-06-10T23:01:12ZengIEEEIEEE Open Journal of Circuits and Systems2644-12252021-01-01240741910.1109/OJCAS.2021.30833329440253FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer PerceptronsSimon Wiedemann0Suhas Shivapakash1https://orcid.org/0000-0002-9173-213XDaniel Becking2https://orcid.org/0000-0002-0459-9781Pablo Wiedemann3Wojciech Samek4https://orcid.org/0000-0002-6283-3265Friedel Gerfers5https://orcid.org/0000-0002-0520-1923Thomas Wiegand6https://orcid.org/0000-0002-1121-2581Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Computer Engineering and Microelectronics, Chair of Mixed Signal Circuit Design, Technical University of Berlin, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Computer Engineering and Microelectronics, Chair of Mixed Signal Circuit Design, Technical University of Berlin, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyWith the growing demand for deploying Deep Learning models to the “edge”, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. The work’s approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named <italic>FantastIC4</italic>, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by <inline-formula> <tex-math notation="LaTeX">$51\times $ </tex-math></inline-formula> in terms of throughput and <inline-formula> <tex-math notation="LaTeX">$145\times $ </tex-math></inline-formula> in terms of area efficiency (GOPS/mm<sup>2</sup>).https://ieeexplore.ieee.org/document/9440253/Deep learningneural network compressionefficient representationefficient processing of DNNsDNN accelerator
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Simon Wiedemann Suhas Shivapakash Daniel Becking Pablo Wiedemann Wojciech Samek Friedel Gerfers Thomas Wiegand
spellingShingle	Simon Wiedemann Suhas Shivapakash Daniel Becking Pablo Wiedemann Wojciech Samek Friedel Gerfers Thomas Wiegand FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons IEEE Open Journal of Circuits and Systems Deep learning neural network compression efficient representation efficient processing of DNNs DNN accelerator
author_facet	Simon Wiedemann Suhas Shivapakash Daniel Becking Pablo Wiedemann Wojciech Samek Friedel Gerfers Thomas Wiegand
author_sort	Simon Wiedemann
title	FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_short	FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_full	FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_fullStr	FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_full_unstemmed	FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_sort	fantastic4: a hardware-software co-design approach for efficiently running 4bit-compact multilayer perceptrons
publisher	IEEE
series	IEEE Open Journal of Circuits and Systems
issn	2644-1225
publishDate	2021-01-01
description	With the growing demand for deploying Deep Learning models to the “edge”, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. The work’s approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named <italic>FantastIC4</italic>, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by <inline-formula> <tex-math notation="LaTeX">$51\times $ </tex-math></inline-formula> in terms of throughput and <inline-formula> <tex-math notation="LaTeX">$145\times $ </tex-math></inline-formula> in terms of area efficiency (GOPS/mm<sup>2</sup>).
topic	Deep learning neural network compression efficient representation efficient processing of DNNs DNN accelerator
url	https://ieeexplore.ieee.org/document/9440253/
work_keys_str_mv	AT simonwiedemann fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT suhasshivapakash fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT danielbecking fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT pablowiedemann fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT wojciechsamek fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT friedelgerfers fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT thomaswiegand fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons
_version_	1721384305764073472

FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons

Similar Items