A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs

Convolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy...

Full description

Bibliographic Details
Main Authors: Mario P. Vestias, Rui P. Duarte, Jose T. De Sousa, Horacio C. Neto
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9110581/
id doaj-9ba698e6a78c468785026cabf6e99df3
record_format Article
spelling doaj-9ba698e6a78c468785026cabf6e99df32021-03-30T02:19:47ZengIEEEIEEE Access2169-35362020-01-01810722910724310.1109/ACCESS.2020.30004449110581A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAsMario P. Vestias0https://orcid.org/0000-0001-8556-4507Rui P. Duarte1https://orcid.org/0000-0002-7060-4745Jose T. De Sousa2https://orcid.org/0000-0001-7525-7546Horacio C. Neto3https://orcid.org/0000-0002-3621-8322INESC-ID, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalINESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, PortugalConvolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy. The high computing and memory requirements of convolutional neural networks have been major obstacles to the broader deployment of CNNs on edge devices. Data quantization is an optimization method that reduces the number of bits used to represent weights and activations of a network model, minimizing storage requirements and computing complexity. Quantization can be applied at the layer level, by using different bit widths in different layers: this is called hybrid quantization. This article proposes a new efficient and configurable architecture for running CNNs with hybrid quantization in low-density Field-Programmable Gate Arrays (FPGAs) targeting edge devices. The architecture has been implemented on the Xilinx ZYNQ7020/45 devices and is running the AlexNet and VGG16 networks. Running AlexNet, the architecture has a throughput up to 508 images per second on the ZYNQ7020 device, and 1639 images per second on the ZYNQ7045 device. Considering VGG16, the architecture delivers up to 43 images per second on the ZYNQ7020 device, and 81 images per second on the ZYNQ7045 device. The proposed hybrid architecture achieves up to 13.7x improvement in performance compared to state-of-the-art solutions, with small accuracy degradation.https://ieeexplore.ieee.org/document/9110581/Convolutional neural networkdeep learningembedded computingfield-programmable gate arrayhybrid quantization
collection DOAJ
language English
format Article
sources DOAJ
author Mario P. Vestias
Rui P. Duarte
Jose T. De Sousa
Horacio C. Neto
spellingShingle Mario P. Vestias
Rui P. Duarte
Jose T. De Sousa
Horacio C. Neto
A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs
IEEE Access
Convolutional neural network
deep learning
embedded computing
field-programmable gate array
hybrid quantization
author_facet Mario P. Vestias
Rui P. Duarte
Jose T. De Sousa
Horacio C. Neto
author_sort Mario P. Vestias
title A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs
title_short A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs
title_full A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs
title_fullStr A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs
title_full_unstemmed A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs
title_sort configurable architecture for running hybrid convolutional neural networks in low-density fpgas
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Convolutional neural networks have become the state of the art of machine learning for a vast set of applications, especially for image classification and object detection. There are several advantages to running inference on these models at the edge, including real-time performance and data privacy. The high computing and memory requirements of convolutional neural networks have been major obstacles to the broader deployment of CNNs on edge devices. Data quantization is an optimization method that reduces the number of bits used to represent weights and activations of a network model, minimizing storage requirements and computing complexity. Quantization can be applied at the layer level, by using different bit widths in different layers: this is called hybrid quantization. This article proposes a new efficient and configurable architecture for running CNNs with hybrid quantization in low-density Field-Programmable Gate Arrays (FPGAs) targeting edge devices. The architecture has been implemented on the Xilinx ZYNQ7020/45 devices and is running the AlexNet and VGG16 networks. Running AlexNet, the architecture has a throughput up to 508 images per second on the ZYNQ7020 device, and 1639 images per second on the ZYNQ7045 device. Considering VGG16, the architecture delivers up to 43 images per second on the ZYNQ7020 device, and 81 images per second on the ZYNQ7045 device. The proposed hybrid architecture achieves up to 13.7x improvement in performance compared to state-of-the-art solutions, with small accuracy degradation.
topic Convolutional neural network
deep learning
embedded computing
field-programmable gate array
hybrid quantization
url https://ieeexplore.ieee.org/document/9110581/
work_keys_str_mv AT mariopvestias aconfigurablearchitectureforrunninghybridconvolutionalneuralnetworksinlowdensityfpgas
AT ruipduarte aconfigurablearchitectureforrunninghybridconvolutionalneuralnetworksinlowdensityfpgas
AT josetdesousa aconfigurablearchitectureforrunninghybridconvolutionalneuralnetworksinlowdensityfpgas
AT horaciocneto aconfigurablearchitectureforrunninghybridconvolutionalneuralnetworksinlowdensityfpgas
AT mariopvestias configurablearchitectureforrunninghybridconvolutionalneuralnetworksinlowdensityfpgas
AT ruipduarte configurablearchitectureforrunninghybridconvolutionalneuralnetworksinlowdensityfpgas
AT josetdesousa configurablearchitectureforrunninghybridconvolutionalneuralnetworksinlowdensityfpgas
AT horaciocneto configurablearchitectureforrunninghybridconvolutionalneuralnetworksinlowdensityfpgas
_version_ 1724185436643917824