Design Framework for High-Performance Elliptic Curve Cryptographic Processors

博士 === 國立清華大學 === 資訊工程學系 === 98 === We present a design framework for the high-performance Elliptic Curve Cryptographic (ECC) processors and the systematic design methodology for the cost-effectiveness design exploration. First, a parallel and scalable ECC architecture utilizing one to four Arithmet...

Full description

Bibliographic Details
Main Authors: Lai, Jyu-Yuan, 賴鉅元
Other Authors: Huang, Chih-Tsun
Format: Others
Language:en_US
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/07621280303604761274
Description
Summary:博士 === 國立清華大學 === 資訊工程學系 === 98 === We present a design framework for the high-performance Elliptic Curve Cryptographic (ECC) processors and the systematic design methodology for the cost-effectiveness design exploration. First, a parallel and scalable ECC architecture utilizing one to four Arithmetic Units (AUs) is proposed for the ECC arithmetic over both prime field GF(p) and binary field GF(2m). The dual-field ECC cipher core supports comprehensive cryptographic functions to fulfill realistic security applications, such as the Elliptic Curve Digital Signature Algorithm (ECDSA) and data encryption/decryption schemes, with arbitrary elliptic curves and arbitrary finite fields of different field sizes. Second, with the scalable architecture, we propose an efficient two-phase, i.e., coarse-grained and fine-grained, operation scheduling methodology. Given various timing and resource constraints, our two-phase operation scheduling optimizes the parallel architecture rapidly and systematically. Third, with the optimized ECC cores as design templates, a novel ECC architecture with multiple cipher cores is proposed. Therefore, a large point scalar multiplication can be replaced by several smaller ones which can be executed simultaneously to speed up the operation time significantly. Finally, a scalar splitting technique is proposed for the multi-core ECC architecture. With the proposed scalar splitting technique, ECC processors with homogeneous and heterogeneous configurations can be generated and analyzed automatically. With the entire design framework, different levels of parallelism among design hierarchies is explored. The optimization to a variety of applications with different area/throughput requirements can be achieved rapidly and efficiently. Therefore, design of high-performance and cost-effective cryptographic processors becomes systematic. Using 130nm CMOS technology, we have implemented two 160-bit dual-field ECC processor chips by adopting the proposed two-phase operation scheduling. The test chips addressedrealistic chip implementation, measurement, and characterization. Each of them contains four dual-field 32×32-bit AUs in parallel to speed up the ECC arithmetic. The first one supports comprehensive cryptographic functions, including the point coordinate conversion, point double, point addition, point scalar multiplication, Montgomery pre-/post-processing, modular exponentiation, common finite field arithmetic functions, and RSA basic operations. Prime field with arbitrary prime and binary field with arbitrary irreducible polynomial are supported as well as arbitrary elliptic curve. In addition, the second fabricated chip integrated the advanced field inversion method and scheduler-controlled datapath to provide high-throughput and energy-adaptive security computing with power-performance trade-off. It measures 4.97mm2 with the core area of only 1.35mm2, and is capable of parallel and serial operation modes with unified architecture for both prime field and binary field cryptosystems. The measurement results show that a 160-bit point scalar multiplication with coordinate conversion can be done in 385μs at 141MHz with core power of 80.4mW over GF(p) and in 272μs at 158MHz with 79.6mW over GF(2m) in the parallel mode. It is a significant improvement over the first chip, with the speedup of 1.58 times over GF(p) and 1.37 times over GF(2m) in terms of operation time. In addition, the second chip is at most 8.05 and 3.09 times faster than other ECC architecture over GF(p) and GF(2m), respectively. The comparison of throughput, area, power and energy consumption among different ECC designs justifies that our high-throughput processor chips provide power- and energy-efficient implementation with the flexibility of dual-field ECC. Furthermore, an ultra high-performance ECC processor with four ECC cipher cores is proposed by applying the multi-core ECC architecture with the proposed scalar splitting method. Each cipher core consists of three 256×16-bit AUs. According to the pre-layout simulation, the parallel ECC processor with 1383K gates achieves the throughput of more than 27K point scalar multiplications per second (i.e., 36.70μs per operation) for 256-bit ECC over GF(p) by using 90nm CMOS technology, which is 1.1 to 122 times faster as compared with other ECC designs. The comparison shows that our processor outperforms others significantly both in terms of throughput and area efficiency. The proposed methodology can therefore be justified to explore optimized high-performance ECC processors for widespread realistic applications.