Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators

To effectively compute convolutional layers, a complex design space must exist (e.g., the dataflow techniques associated with the layer parameters, loop transformation techniques, and hardware parameters). For efficient design space exploration (DSE) of various dataflow techniques, namely, the weigh...

Full description

Bibliographic Details
Main Authors:	Chan Park, Sungkyung Park, Chester Sungchung Park
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Accelerator convolutional neural networks (CNNs) dataflow techniques roofline simulation processing element (PE)
Online Access:	https://ieeexplore.ieee.org/document/9201450/

id	doaj-d2040bba8fc54ebba9049d74f14a5acd
record_format	Article
spelling	doaj-d2040bba8fc54ebba9049d74f14a5acd2021-03-30T03:46:01ZengIEEEIEEE Access2169-35362020-01-01817250917252310.1109/ACCESS.2020.30255509201450Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN AcceleratorsChan Park0https://orcid.org/0000-0003-3856-5202Sungkyung Park1https://orcid.org/0000-0003-1171-5020Chester Sungchung Park2https://orcid.org/0000-0003-2009-2814Department of Electronics Engineering, Pusan National University, Pusan, South KoreaDepartment of Electronics Engineering, Pusan National University, Pusan, South KoreaDepartment of Electrical Engineering, Konkuk University, Seoul, South KoreaTo effectively compute convolutional layers, a complex design space must exist (e.g., the dataflow techniques associated with the layer parameters, loop transformation techniques, and hardware parameters). For efficient design space exploration (DSE) of various dataflow techniques, namely, the weight-stationary (WS), output-stationary (OS), row-stationary (RS), and no local reuse (NLR) techniques, the processing element (PE) structure and computational pattern of each dataflow technique are analyzed. Various performance metrics are calculated, namely, the throughput (in giga-operations per second, GOPS), computation-to-communication ratio (CCR), on-chip memory usage, and off-chip memory bandwidth, as closed-form expressions of the layer and hardware parameters. In addition, loop interchange and loop unrolling techniques with a double-buffer architecture are assumed. Many roofline model-based simulations are performed to explore relevant dataflow techniques for a wide variety of convolutional layers of typical neural networks. Through simulation, this paper provides insights into the trends in accelerator performance as the layer parameters change. For convolutional layers with large input and output feature map (ifmap and ofmap) widths and heights, the GOPS of the NLR dataflow technique tends to be higher than that of the techniques. For convolutional layers with low weight and ofmap widths and heights, the RS dataflow technique achieves optimal GOPS and on-chip memory usage. In the case of convolutional layers with small weight widths and heights, the GOPS of the WS dataflow technique tends to be high. In the case of convolutional layers with small ofmap widths and heights, the OS dataflow technique achieves optimal GOPS and on-chip memory usage.https://ieeexplore.ieee.org/document/9201450/Acceleratorconvolutional neural networks (CNNs)dataflow techniquesrooflinesimulationprocessing element (PE)
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chan Park Sungkyung Park Chester Sungchung Park
spellingShingle	Chan Park Sungkyung Park Chester Sungchung Park Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators IEEE Access Accelerator convolutional neural networks (CNNs) dataflow techniques roofline simulation processing element (PE)
author_facet	Chan Park Sungkyung Park Chester Sungchung Park
author_sort	Chan Park
title	Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators
title_short	Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators
title_full	Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators
title_fullStr	Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators
title_full_unstemmed	Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators
title_sort	roofline-model-based design space exploration for dataflow techniques of cnn accelerators
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	To effectively compute convolutional layers, a complex design space must exist (e.g., the dataflow techniques associated with the layer parameters, loop transformation techniques, and hardware parameters). For efficient design space exploration (DSE) of various dataflow techniques, namely, the weight-stationary (WS), output-stationary (OS), row-stationary (RS), and no local reuse (NLR) techniques, the processing element (PE) structure and computational pattern of each dataflow technique are analyzed. Various performance metrics are calculated, namely, the throughput (in giga-operations per second, GOPS), computation-to-communication ratio (CCR), on-chip memory usage, and off-chip memory bandwidth, as closed-form expressions of the layer and hardware parameters. In addition, loop interchange and loop unrolling techniques with a double-buffer architecture are assumed. Many roofline model-based simulations are performed to explore relevant dataflow techniques for a wide variety of convolutional layers of typical neural networks. Through simulation, this paper provides insights into the trends in accelerator performance as the layer parameters change. For convolutional layers with large input and output feature map (ifmap and ofmap) widths and heights, the GOPS of the NLR dataflow technique tends to be higher than that of the techniques. For convolutional layers with low weight and ofmap widths and heights, the RS dataflow technique achieves optimal GOPS and on-chip memory usage. In the case of convolutional layers with small weight widths and heights, the GOPS of the WS dataflow technique tends to be high. In the case of convolutional layers with small ofmap widths and heights, the OS dataflow technique achieves optimal GOPS and on-chip memory usage.
topic	Accelerator convolutional neural networks (CNNs) dataflow techniques roofline simulation processing element (PE)
url	https://ieeexplore.ieee.org/document/9201450/
work_keys_str_mv	AT chanpark rooflinemodelbaseddesignspaceexplorationfordataflowtechniquesofcnnaccelerators AT sungkyungpark rooflinemodelbaseddesignspaceexplorationfordataflowtechniquesofcnnaccelerators AT chestersungchungpark rooflinemodelbaseddesignspaceexplorationfordataflowtechniquesofcnnaccelerators
_version_	1724182894158544896

Roofline-Model-Based Design Space Exploration for Dataflow Techniques of CNN Accelerators

Similar Items