FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons
With the growing demand for deploying Deep Learning models to the “edge”, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm fo...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Open Journal of Circuits and Systems |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9440253/ |
id |
doaj-bbed58d04eec4fa585b028f1f2015059 |
---|---|
record_format |
Article |
spelling |
doaj-bbed58d04eec4fa585b028f1f20150592021-06-10T23:01:12ZengIEEEIEEE Open Journal of Circuits and Systems2644-12252021-01-01240741910.1109/OJCAS.2021.30833329440253FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer PerceptronsSimon Wiedemann0Suhas Shivapakash1https://orcid.org/0000-0002-9173-213XDaniel Becking2https://orcid.org/0000-0002-0459-9781Pablo Wiedemann3Wojciech Samek4https://orcid.org/0000-0002-6283-3265Friedel Gerfers5https://orcid.org/0000-0002-0520-1923Thomas Wiegand6https://orcid.org/0000-0002-1121-2581Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Computer Engineering and Microelectronics, Chair of Mixed Signal Circuit Design, Technical University of Berlin, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyDepartment of Computer Engineering and Microelectronics, Chair of Mixed Signal Circuit Design, Technical University of Berlin, Berlin, GermanyDepartment of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, GermanyWith the growing demand for deploying Deep Learning models to the “edge”, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. The work’s approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named <italic>FantastIC4</italic>, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by <inline-formula> <tex-math notation="LaTeX">$51\times $ </tex-math></inline-formula> in terms of throughput and <inline-formula> <tex-math notation="LaTeX">$145\times $ </tex-math></inline-formula> in terms of area efficiency (GOPS/mm<sup>2</sup>).https://ieeexplore.ieee.org/document/9440253/Deep learningneural network compressionefficient representationefficient processing of DNNsDNN accelerator |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Simon Wiedemann Suhas Shivapakash Daniel Becking Pablo Wiedemann Wojciech Samek Friedel Gerfers Thomas Wiegand |
spellingShingle |
Simon Wiedemann Suhas Shivapakash Daniel Becking Pablo Wiedemann Wojciech Samek Friedel Gerfers Thomas Wiegand FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons IEEE Open Journal of Circuits and Systems Deep learning neural network compression efficient representation efficient processing of DNNs DNN accelerator |
author_facet |
Simon Wiedemann Suhas Shivapakash Daniel Becking Pablo Wiedemann Wojciech Samek Friedel Gerfers Thomas Wiegand |
author_sort |
Simon Wiedemann |
title |
FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons |
title_short |
FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons |
title_full |
FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons |
title_fullStr |
FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons |
title_full_unstemmed |
FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4Bit-Compact Multilayer Perceptrons |
title_sort |
fantastic4: a hardware-software co-design approach for efficiently running 4bit-compact multilayer perceptrons |
publisher |
IEEE |
series |
IEEE Open Journal of Circuits and Systems |
issn |
2644-1225 |
publishDate |
2021-01-01 |
description |
With the growing demand for deploying Deep Learning models to the “edge”, it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. The work’s approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named <italic>FantastIC4</italic>, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the models amenable for efficient execution on FantastIC4, we introduce a novel entropy-constrained training method that renders them to be robust to 4bit quantization and highly compressible in size simultaneously. The experimental results show that we can achieve throughputs of 2.45 TOPS with a total power consumption of 3.6W on a Virtual Ultrascale FPGA XCVU440 device implementation, and achieve a total power efficiency of 20.17 TOPS/W on a 22nm process ASIC version. When compared to other state-of-the-art accelerators designed for the Google Speech Command (GSC) dataset, FantastIC4 is better by <inline-formula> <tex-math notation="LaTeX">$51\times $ </tex-math></inline-formula> in terms of throughput and <inline-formula> <tex-math notation="LaTeX">$145\times $ </tex-math></inline-formula> in terms of area efficiency (GOPS/mm<sup>2</sup>). |
topic |
Deep learning neural network compression efficient representation efficient processing of DNNs DNN accelerator |
url |
https://ieeexplore.ieee.org/document/9440253/ |
work_keys_str_mv |
AT simonwiedemann fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT suhasshivapakash fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT danielbecking fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT pablowiedemann fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT wojciechsamek fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT friedelgerfers fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons AT thomaswiegand fantastic4ahardwaresoftwarecodesignapproachforefficientlyrunning4bitcompactmultilayerperceptrons |
_version_ |
1721384305764073472 |