Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data

With the advent of IoT and Cloud computing service technology, the size of user data to be managed and file data to be transmitted has been significantly increased. To protect users’ personal information, it is necessary to encrypt it in secure and efficient way. Since servers handling a number of c...

Full description

Bibliographic Details
Main Authors:	SangWoo An, Seog Chung SEO
Format:	Article
Language:	English
Published:	MDPI AG 2020-05-01
Series:	Applied Sciences
Subjects:	AES CHAM LEA Graphic Processing Unit (GPU) CUDA Counter (CTR) mode
Online Access:	https://www.mdpi.com/2076-3417/10/11/3711

id	doaj-db6a8b2a4bd0408882b9c4f5eb049b20
record_format	Article
spelling	doaj-db6a8b2a4bd0408882b9c4f5eb049b202020-11-25T03:21:55ZengMDPI AGApplied Sciences2076-34172020-05-01103711371110.3390/app10113711Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large DataSangWoo An0Seog Chung SEO1Department of Financial Information Security, Kookmin University, Seoul 02707, KoreaDepartment of Financial Information Security, Kookmin University, Seoul 02707, KoreaWith the advent of IoT and Cloud computing service technology, the size of user data to be managed and file data to be transmitted has been significantly increased. To protect users’ personal information, it is necessary to encrypt it in secure and efficient way. Since servers handling a number of clients or IoT devices have to encrypt a large amount of data without compromising service capabilities in real-time, Graphic Processing Units (GPUs) have been considered as a proper candidate for a crypto accelerator for processing a huge amount of data in this situation. In this paper, we present highly efficient implementations of block ciphers on NVIDIA GPUs (especially, Maxwell, Pascal, and Turing architectures) for environments using massively large data in IoT and Cloud computing applications. As block cipher algorithms, we choose AES, a representative standard block cipher algorithm; LEA, which was recently added in ISO/IEC 29192-2:2019 standard; and CHAM, a recently developed lightweight block cipher algorithm. To maximize the parallelism in the encryption process, we utilize Counter (CTR) mode of operation and customize it by using GPU’s characteristics. We applied several optimization techniques with respect to the characteristics of GPU architecture such as kernel parallelism, memory optimization, and CUDA stream. Furthermore, we optimized each target cipher by considering the algorithmic characteristics of each cipher by implementing the core part of each cipher with handcrafted inline PTX (Parallel Thread eXecution) codes, which are virtual assembly codes in CUDA platforms. With the application of our optimization techniques, in our implementation on RTX 2070 GPU, AES and LEA show up to 310 Gbps and 2.47 Tbps of throughput, respectively, which are 10.7% and 67% improved compared with the 279.86 Gbps and 1.47 Tbps of the previous best result. In the case of CHAM, this is the first optimized implementation on GPUs and it achieves 3.03 Tbps of throughput on RTX 2070 GPU.https://www.mdpi.com/2076-3417/10/11/3711AESCHAMLEAGraphic Processing Unit (GPU)CUDACounter (CTR) mode
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	SangWoo An Seog Chung SEO
spellingShingle	SangWoo An Seog Chung SEO Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data Applied Sciences AES CHAM LEA Graphic Processing Unit (GPU) CUDA Counter (CTR) mode
author_facet	SangWoo An Seog Chung SEO
author_sort	SangWoo An
title	Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data
title_short	Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data
title_full	Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data
title_fullStr	Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data
title_full_unstemmed	Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data
title_sort	highly efficient implementation of block ciphers on graphic processing units for massively large data
publisher	MDPI AG
series	Applied Sciences
issn	2076-3417
publishDate	2020-05-01
description	With the advent of IoT and Cloud computing service technology, the size of user data to be managed and file data to be transmitted has been significantly increased. To protect users’ personal information, it is necessary to encrypt it in secure and efficient way. Since servers handling a number of clients or IoT devices have to encrypt a large amount of data without compromising service capabilities in real-time, Graphic Processing Units (GPUs) have been considered as a proper candidate for a crypto accelerator for processing a huge amount of data in this situation. In this paper, we present highly efficient implementations of block ciphers on NVIDIA GPUs (especially, Maxwell, Pascal, and Turing architectures) for environments using massively large data in IoT and Cloud computing applications. As block cipher algorithms, we choose AES, a representative standard block cipher algorithm; LEA, which was recently added in ISO/IEC 29192-2:2019 standard; and CHAM, a recently developed lightweight block cipher algorithm. To maximize the parallelism in the encryption process, we utilize Counter (CTR) mode of operation and customize it by using GPU’s characteristics. We applied several optimization techniques with respect to the characteristics of GPU architecture such as kernel parallelism, memory optimization, and CUDA stream. Furthermore, we optimized each target cipher by considering the algorithmic characteristics of each cipher by implementing the core part of each cipher with handcrafted inline PTX (Parallel Thread eXecution) codes, which are virtual assembly codes in CUDA platforms. With the application of our optimization techniques, in our implementation on RTX 2070 GPU, AES and LEA show up to 310 Gbps and 2.47 Tbps of throughput, respectively, which are 10.7% and 67% improved compared with the 279.86 Gbps and 1.47 Tbps of the previous best result. In the case of CHAM, this is the first optimized implementation on GPUs and it achieves 3.03 Tbps of throughput on RTX 2070 GPU.
topic	AES CHAM LEA Graphic Processing Unit (GPU) CUDA Counter (CTR) mode
url	https://www.mdpi.com/2076-3417/10/11/3711
work_keys_str_mv	AT sangwooan highlyefficientimplementationofblockciphersongraphicprocessingunitsformassivelylargedata AT seogchungseo highlyefficientimplementationofblockciphersongraphicprocessingunitsformassivelylargedata
_version_	1724612403169067008

Highly Efficient Implementation of Block Ciphers on Graphic Processing Units for Massively Large Data

Similar Items