Summary: | 博士 === 國立清華大學 === 資訊工程學系 === 98 === We present a design framework for the high-performance Elliptic Curve Cryptographic
(ECC) processors and the systematic design methodology for the cost-effectiveness design
exploration. First, a parallel and scalable ECC architecture utilizing one to four Arithmetic
Units (AUs) is proposed for the ECC arithmetic over both prime field GF(p) and binary
field GF(2m). The dual-field ECC cipher core supports comprehensive cryptographic functions
to fulfill realistic security applications, such as the Elliptic Curve Digital Signature
Algorithm (ECDSA) and data encryption/decryption schemes, with arbitrary elliptic curves
and arbitrary finite fields of different field sizes. Second, with the scalable architecture,
we propose an efficient two-phase, i.e., coarse-grained and fine-grained, operation scheduling
methodology. Given various timing and resource constraints, our two-phase operation
scheduling optimizes the parallel architecture rapidly and systematically. Third, with the
optimized ECC cores as design templates, a novel ECC architecture with multiple cipher
cores is proposed. Therefore, a large point scalar multiplication can be replaced by several
smaller ones which can be executed simultaneously to speed up the operation time significantly.
Finally, a scalar splitting technique is proposed for the multi-core ECC architecture.
With the proposed scalar splitting technique, ECC processors with homogeneous and heterogeneous
configurations can be generated and analyzed automatically. With the entire
design framework, different levels of parallelism among design hierarchies is explored. The
optimization to a variety of applications with different area/throughput requirements can
be achieved rapidly and efficiently. Therefore, design of high-performance and cost-effective
cryptographic processors becomes systematic.
Using 130nm CMOS technology, we have implemented two 160-bit dual-field ECC processor
chips by adopting the proposed two-phase operation scheduling. The test chips addressedrealistic chip implementation, measurement, and characterization. Each of them contains
four dual-field 32×32-bit AUs in parallel to speed up the ECC arithmetic. The first one
supports comprehensive cryptographic functions, including the point coordinate conversion,
point double, point addition, point scalar multiplication, Montgomery pre-/post-processing,
modular exponentiation, common finite field arithmetic functions, and RSA basic operations.
Prime field with arbitrary prime and binary field with arbitrary irreducible polynomial are
supported as well as arbitrary elliptic curve. In addition, the second fabricated chip integrated
the advanced field inversion method and scheduler-controlled datapath to provide
high-throughput and energy-adaptive security computing with power-performance trade-off.
It measures 4.97mm2 with the core area of only 1.35mm2, and is capable of parallel and
serial operation modes with unified architecture for both prime field and binary field cryptosystems.
The measurement results show that a 160-bit point scalar multiplication with
coordinate conversion can be done in 385μs at 141MHz with core power of 80.4mW over
GF(p) and in 272μs at 158MHz with 79.6mW over GF(2m) in the parallel mode. It is
a significant improvement over the first chip, with the speedup of 1.58 times over GF(p)
and 1.37 times over GF(2m) in terms of operation time. In addition, the second chip is
at most 8.05 and 3.09 times faster than other ECC architecture over GF(p) and GF(2m),
respectively. The comparison of throughput, area, power and energy consumption among
different ECC designs justifies that our high-throughput processor chips provide power- and
energy-efficient implementation with the flexibility of dual-field ECC.
Furthermore, an ultra high-performance ECC processor with four ECC cipher cores is
proposed by applying the multi-core ECC architecture with the proposed scalar splitting
method. Each cipher core consists of three 256×16-bit AUs. According to the pre-layout
simulation, the parallel ECC processor with 1383K gates achieves the throughput of more
than 27K point scalar multiplications per second (i.e., 36.70μs per operation) for 256-bit ECC
over GF(p) by using 90nm CMOS technology, which is 1.1 to 122 times faster as compared
with other ECC designs. The comparison shows that our processor outperforms others
significantly both in terms of throughput and area efficiency. The proposed methodology can
therefore be justified to explore optimized high-performance ECC processors for widespread
realistic applications.
|