Area- and energy-efficient CORDIC accelerators in deep sub-micron CMOS technologies
The COordinate Rotate DIgital Computer (CORDIC) algorithm is a well known versatile approach and is widely applied in today's SoCs for especially but not restricted to digital communications. Dedicated CORDIC blocks can be implemented in deep sub-micron CMOS technologies at very low area and en...
Main Authors: | , |
---|---|
Format: | Article |
Language: | deu |
Published: |
Copernicus Publications
2012-09-01
|
Series: | Advances in Radio Science |
Online Access: | http://www.adv-radio-sci.net/10/207/2012/ars-10-207-2012.pdf |
Summary: | The COordinate Rotate DIgital Computer (CORDIC) algorithm is a well known
versatile approach and is widely applied in today's SoCs for especially but
not restricted to digital communications. Dedicated CORDIC blocks can be
implemented in deep sub-micron CMOS technologies at very low area and energy
costs and are attractive to be used as hardware accelerators for Application
Specific Instruction Processors (ASIPs). Thereby, overcoming the well known
energy vs. flexibility conflict. Optimizing Global Navigation Satellite
System (GNSS) receivers to reduce the hardware complexity is an important
research topic at present. In such receivers CORDIC accelerators can be used
for digital baseband processing (fixed-point) and in Position-Velocity-Time
estimation (floating-point). A micro architecture well suited to such
applications is presented. This architecture is parameterized according to
the wordlengths as well as the number of iterations and can be easily
extended for floating point data format. Moreover, area can be traded for
throughput by partially or even fully unrolling the iterations, whereby the
degree of pipelining is organized with one CORDIC iteration per cycle. From
the architectural description, the macro layout can be generated fully
automatically using an in-house datapath generator tool. Since the adders
and shifters play an important role in optimizing the CORDIC block, they
must be carefully optimized for high area and energy efficiency in the
underlying technology. So, for this purpose carry-select adders and
logarithmic shifters have been chosen. Device dimensioning was automatically
optimized with respect to dynamic and static power, area and performance
using the in-house tool. The fully sequential CORDIC block for fixed-point
digital baseband processing features a wordlength of 16 bits, requires 5232
transistors, which is implemented in a 40-nm CMOS technology and occupies a
silicon area of 1560 μm<sup>2</sup> only. Maximum clock frequency from
circuit simulation of extracted netlist is 768 MHz under typical, and 463 MHz under worst case technology and application corner conditions,
respectively. Simulated dynamic power dissipation is 0.24 uW MHz<sup>−1</sup> at 0.9 V; static power is 38 uW in slow corner, 65 uW in typical corner and 518 uW in
fast corner, respectively. The latter can be reduced by 43% in a 40-nm
CMOS technology using 0.5 V reverse-backbias. These features are compared
with the results from different design styles as well as with an
implementation in 28-nm CMOS technology. It is interesting that in the
latter case area scales as expected, but worst case performance and energy
do not scale well anymore. |
---|---|
ISSN: | 1684-9965 1684-9973 |