High-Performance Computation in Residue Number System Using Floating-Point Arithmetic
Residue number system (RNS) is known for its parallel arithmetic and has been used in recent decades in various important applications, from digital signal processing and deep neural networks to cryptography and high-precision computation. However, comparison, sign identification, overflow detection...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-01-01
|
Series: | Computation |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-3197/9/2/9 |
id |
doaj-825ddfd1dc0a41a380bf9ad025e85818 |
---|---|
record_format |
Article |
spelling |
doaj-825ddfd1dc0a41a380bf9ad025e858182021-01-22T00:02:40ZengMDPI AGComputation2079-31972021-01-0199910.3390/computation9020009High-Performance Computation in Residue Number System Using Floating-Point ArithmeticKonstantin Isupov0Department of Electronic Computing Machines, Vyatka State University, 610000 Kirov, RussiaResidue number system (RNS) is known for its parallel arithmetic and has been used in recent decades in various important applications, from digital signal processing and deep neural networks to cryptography and high-precision computation. However, comparison, sign identification, overflow detection, and division are still hard to implement in RNS. For such operations, most of the methods proposed in the literature only support small dynamic ranges (up to several tens of bits), so they are only suitable for low-precision applications. We recently proposed a method that supports arbitrary moduli sets with cryptographically sized dynamic ranges, up to several thousands of bits. The practical interest of our method compared to existing methods is that it relies only on very fast standard floating-point operations, so it is suitable for multiple-precision applications and can be efficiently implemented on many general-purpose platforms that support IEEE 754 arithmetic. In this paper, we make further improvements to this method and demonstrate that it can successfully be applied to implement efficient data-parallel primitives operating in the RNS domain, namely finding the maximum element of an array of RNS numbers on graphics processing units. Our experimental results on an NVIDIA RTX 2080 GPU show that for random residues and a 128-moduli set with 2048-bit dynamic range, the proposed implementation reduces the running time by a factor of 39 and the memory consumption by a factor of 13 compared to an implementation based on mixed-radix conversion.https://www.mdpi.com/2079-3197/9/2/9residue number systemdigital arithmetichigh-performance computingdata-parallel primitivesgraphics processing units |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Konstantin Isupov |
spellingShingle |
Konstantin Isupov High-Performance Computation in Residue Number System Using Floating-Point Arithmetic Computation residue number system digital arithmetic high-performance computing data-parallel primitives graphics processing units |
author_facet |
Konstantin Isupov |
author_sort |
Konstantin Isupov |
title |
High-Performance Computation in Residue Number System Using Floating-Point Arithmetic |
title_short |
High-Performance Computation in Residue Number System Using Floating-Point Arithmetic |
title_full |
High-Performance Computation in Residue Number System Using Floating-Point Arithmetic |
title_fullStr |
High-Performance Computation in Residue Number System Using Floating-Point Arithmetic |
title_full_unstemmed |
High-Performance Computation in Residue Number System Using Floating-Point Arithmetic |
title_sort |
high-performance computation in residue number system using floating-point arithmetic |
publisher |
MDPI AG |
series |
Computation |
issn |
2079-3197 |
publishDate |
2021-01-01 |
description |
Residue number system (RNS) is known for its parallel arithmetic and has been used in recent decades in various important applications, from digital signal processing and deep neural networks to cryptography and high-precision computation. However, comparison, sign identification, overflow detection, and division are still hard to implement in RNS. For such operations, most of the methods proposed in the literature only support small dynamic ranges (up to several tens of bits), so they are only suitable for low-precision applications. We recently proposed a method that supports arbitrary moduli sets with cryptographically sized dynamic ranges, up to several thousands of bits. The practical interest of our method compared to existing methods is that it relies only on very fast standard floating-point operations, so it is suitable for multiple-precision applications and can be efficiently implemented on many general-purpose platforms that support IEEE 754 arithmetic. In this paper, we make further improvements to this method and demonstrate that it can successfully be applied to implement efficient data-parallel primitives operating in the RNS domain, namely finding the maximum element of an array of RNS numbers on graphics processing units. Our experimental results on an NVIDIA RTX 2080 GPU show that for random residues and a 128-moduli set with 2048-bit dynamic range, the proposed implementation reduces the running time by a factor of 39 and the memory consumption by a factor of 13 compared to an implementation based on mixed-radix conversion. |
topic |
residue number system digital arithmetic high-performance computing data-parallel primitives graphics processing units |
url |
https://www.mdpi.com/2079-3197/9/2/9 |
work_keys_str_mv |
AT konstantinisupov highperformancecomputationinresiduenumbersystemusingfloatingpointarithmetic |
_version_ |
1724329595925168128 |