Block Ciphers:fast Implementations on x86-64 Architecture

Encryption is being used more than ever before. It is used to prevent eavesdropping on our communications over cell phone calls and Internet, securing network connections, making e-commerce and e-banking possible and generally hiding information from unwanted eyes. The performance of encryption func...

Full description

Bibliographic Details
Main Author: Kivilinna, J. (Jussi)
Format: Dissertation
Language:English
Published: University of Oulu 2013
Subjects:
Online Access:http://urn.fi/URN:NBN:fi:oulu-201305311409
http://nbn-resolving.de/urn:nbn:fi:oulu-201305311409
id ndltd-oulo.fi-oai-oulu.fi-nbnfioulu-201305311409
record_format oai_dc
spelling ndltd-oulo.fi-oai-oulu.fi-nbnfioulu-2013053114092018-06-21T04:47:38ZBlock Ciphers:fast Implementations on x86-64 ArchitectureKivilinna, J. (Jussi)info:eu-repo/semantics/openAccess© Jussi Kivilinna, 2013Information Processing ScienceEncryption is being used more than ever before. It is used to prevent eavesdropping on our communications over cell phone calls and Internet, securing network connections, making e-commerce and e-banking possible and generally hiding information from unwanted eyes. The performance of encryption functions is therefore important as slow working implementation increases costs. At server side faster implementation can reduce the required capacity and on client side it can lower the power usage. Block ciphers are a class of encryption functions that are typically used to encrypt large bulk data, and thus make them a subject of many studies when endeavoring greater performance. The x86-64 architecture is the most dominant processor architecture in server and desktop computers; it has numerous different instruction set extensions, which make the architecture a target of constant new research on fast software implementations. The examined block ciphers — Blowfish, AES, Camellia, Serpent and Twofish — are widely used in various applications and their different designs make them interesting objects of investigation. Several optimization techniques to speed up implementations have been reported in previous research; such as the use of table look-ups, bit-slicing, byte-slicing and the utilization of “out-of-order” scheduling capabilities. We examine these different techniques and utilize them to construct new implementations of the selected block ciphers. Focus with these new implementations is in modes of operation which allow multiple blocks to be processed in parallel; such as the counter mode. The performance measurements of new implementations were carried out by using the System for Unified Performance Evaluation Related to Cryptographic Operations and Primitives (SUPERCOP) framework on four different processors: AMD K8, AMD K10, Intel Core2 and Intel Sandy-Bridge. The parallel processing permitted by particular modes of operation can improve performance of a block cipher even with traditional table look-up optimization. Bit-slicing, byte-slicing and word-slicing can be used to parallelize block cipher processing in vector registers. These ‘sliced’ techniques can improve the throughput of block cipher implementations significantly compared to table look-up based approaches. Our byte-sliced AES-NI/AVX implementation of Camellia reaches the speed of 5.32 cycles per byte on Intel Sandy-Bridge processor, being 2.65 times faster than our two-way table look-up implementation and 3.96 times faster than the implementation found in the OpenSSL library.University of Oulu2013-07-15info:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/publishedVersionapplication/pdfhttp://urn.fi/URN:NBN:fi:oulu-201305311409urn:nbn:fi:oulu-201305311409eng
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Information Processing Science
spellingShingle Information Processing Science
Kivilinna, J. (Jussi)
Block Ciphers:fast Implementations on x86-64 Architecture
description Encryption is being used more than ever before. It is used to prevent eavesdropping on our communications over cell phone calls and Internet, securing network connections, making e-commerce and e-banking possible and generally hiding information from unwanted eyes. The performance of encryption functions is therefore important as slow working implementation increases costs. At server side faster implementation can reduce the required capacity and on client side it can lower the power usage. Block ciphers are a class of encryption functions that are typically used to encrypt large bulk data, and thus make them a subject of many studies when endeavoring greater performance. The x86-64 architecture is the most dominant processor architecture in server and desktop computers; it has numerous different instruction set extensions, which make the architecture a target of constant new research on fast software implementations. The examined block ciphers — Blowfish, AES, Camellia, Serpent and Twofish — are widely used in various applications and their different designs make them interesting objects of investigation. Several optimization techniques to speed up implementations have been reported in previous research; such as the use of table look-ups, bit-slicing, byte-slicing and the utilization of “out-of-order” scheduling capabilities. We examine these different techniques and utilize them to construct new implementations of the selected block ciphers. Focus with these new implementations is in modes of operation which allow multiple blocks to be processed in parallel; such as the counter mode. The performance measurements of new implementations were carried out by using the System for Unified Performance Evaluation Related to Cryptographic Operations and Primitives (SUPERCOP) framework on four different processors: AMD K8, AMD K10, Intel Core2 and Intel Sandy-Bridge. The parallel processing permitted by particular modes of operation can improve performance of a block cipher even with traditional table look-up optimization. Bit-slicing, byte-slicing and word-slicing can be used to parallelize block cipher processing in vector registers. These ‘sliced’ techniques can improve the throughput of block cipher implementations significantly compared to table look-up based approaches. Our byte-sliced AES-NI/AVX implementation of Camellia reaches the speed of 5.32 cycles per byte on Intel Sandy-Bridge processor, being 2.65 times faster than our two-way table look-up implementation and 3.96 times faster than the implementation found in the OpenSSL library.
author Kivilinna, J. (Jussi)
author_facet Kivilinna, J. (Jussi)
author_sort Kivilinna, J. (Jussi)
title Block Ciphers:fast Implementations on x86-64 Architecture
title_short Block Ciphers:fast Implementations on x86-64 Architecture
title_full Block Ciphers:fast Implementations on x86-64 Architecture
title_fullStr Block Ciphers:fast Implementations on x86-64 Architecture
title_full_unstemmed Block Ciphers:fast Implementations on x86-64 Architecture
title_sort block ciphers:fast implementations on x86-64 architecture
publisher University of Oulu
publishDate 2013
url http://urn.fi/URN:NBN:fi:oulu-201305311409
http://nbn-resolving.de/urn:nbn:fi:oulu-201305311409
work_keys_str_mv AT kivilinnajjussi blockciphersfastimplementationsonx8664architecture
_version_ 1718698123244601344