FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons

With the growing demand for deploying Deep Learning models to the “edge”, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm fo...

Full description

Bibliographic Details
Main Authors: Simon Wiedemann, Suhas Shivapakash, Daniel Becking, Pablo Wiedemann, Wojciech Samek, Friedel Gerfers, Thomas Wiegand
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Open Journal of Circuits and Systems
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9440253/
id doaj-bbed58d04eec4fa585b028f1f2015059
record_format Article
spelling doaj-bbed58d04eec4fa585b028f1f20150592021-06-10T23:01:12ZengIEEEIEEE Open Journal of Circuits and Systems2644-12252021-01-01240741910.1109/OJCAS.2021.30833329440253FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer PerceptronsSimon Wiedemann0Suhas Shivapakash1https://orcid.org/0000-0002-9173-213XDaniel Becking2https://orcid.org/0000-0002-0459-9781Pablo Wiedemann3Wojciech Samek4https://orcid.org/0000-0002-6283-3265Friedel Gerfers5https://orcid.org/0000-0002-0520-1923Thomas Wiegand6https://orcid.org/0000-0002-1121-2581Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Computer Engineering and Microelectronics, Chair of Mixed Signal Circuit Design, Technical University of Berlin, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Computer Engineering and Microelectronics, Chair of Mixed Signal Circuit Design, Technical University of Berlin, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyWith the growing demand for deploying Deep Learning models to the &#x201C;edge&#x201D;, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. The work&#x2019;s approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named <italic>FantastIC4</italic>, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by <inline-formula> <tex-math notation="LaTeX">$51\times $ </tex-math></inline-formula> in terms of throughput and <inline-formula> <tex-math notation="LaTeX">$145\times $ </tex-math></inline-formula> in terms of area efficiency (GOPS/mm<sup>2</sup>).https://ieeexplore.ieee.org/document/9440253/Deep learningneural network compressionefficient representationefficient processing of DNNsDNN accelerator
collection DOAJ
language English
format Article
sources DOAJ
author Simon Wiedemann
Suhas Shivapakash
Daniel Becking
Pablo Wiedemann
Wojciech Samek
Friedel Gerfers
Thomas Wiegand
spellingShingle Simon Wiedemann
Suhas Shivapakash
Daniel Becking
Pablo Wiedemann
Wojciech Samek
Friedel Gerfers
Thomas Wiegand
FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
IEEE Open Journal of Circuits and Systems
Deep learning
neural network compression
efficient representation
efficient processing of DNNs
DNN accelerator
author_facet Simon Wiedemann
Suhas Shivapakash
Daniel Becking
Pablo Wiedemann
Wojciech Samek
Friedel Gerfers
Thomas Wiegand
author_sort Simon Wiedemann
title FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_short FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_full FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_fullStr FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_full_unstemmed FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
title_sort fantastic4: a hardware-software co-design approach for efficiently running 4bit-compact multilayer perceptrons
publisher IEEE
series IEEE Open Journal of Circuits and Systems
issn 2644-1225
publishDate 2021-01-01
description With the growing demand for deploying Deep Learning models to the &#x201C;edge&#x201D;, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. The work&#x2019;s approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named <italic>FantastIC4</italic>, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by <inline-formula> <tex-math notation="LaTeX">$51\times $ </tex-math></inline-formula> in terms of throughput and <inline-formula> <tex-math notation="LaTeX">$145\times $ </tex-math></inline-formula> in terms of area efficiency (GOPS/mm<sup>2</sup>).
topic Deep learning
neural network compression
efficient representation
efficient processing of DNNs
DNN accelerator
url https://ieeexplore.ieee.org/document/9440253/
work_keys_str_mv AT simonwiedemann fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons
AT suhasshivapakash fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons
AT danielbecking fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons
AT pablowiedemann fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons
AT wojciechsamek fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons
AT friedelgerfers fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons
AT thomaswiegand fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons
_version_ 1721384305764073472