ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor

The rapid advancements in semiconductor technology have led to constant shrinking of transistor sizes as per Moore's Law. Wireless communications is one field which has seen explosive growth, thanks to the cramming of more transistors into a single chip. Design of these systems involve trade-of...

Full description

Bibliographic Details
Main Author: Kala, S
Other Authors: Nandy, S K
Language:en_US
Published: 2016
Subjects:
Online Access:http://etd.iisc.ernet.in/handle/2005/2557
http://etd.ncsi.iisc.ernet.in/abstracts/3324/G25691-Abs.pdf
id ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-2557
record_format oai_dc
spelling ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-25572018-01-10T03:36:49ZASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT ProcessorKala, SWireless Communication SystemsFast Fourier Transformation ProcessorFast Fourier Transform ArchirectureFast Fourier Transform - AlgorithmsApplication Specific Integrated CircuitFFT ProcessorFFT ArchitectureOrthogonal Frequency Division Multiplexing (OFDM)Communication EngineeringThe rapid advancements in semiconductor technology have led to constant shrinking of transistor sizes as per Moore's Law. Wireless communications is one field which has seen explosive growth, thanks to the cramming of more transistors into a single chip. Design of these systems involve trade-offs between performance, area and power. Fast Fourier Transform is an important component in most of the wireless communication systems. FFTs are widely used in applications like OFDM transceivers, Spectrum sensing in Cognitive Radio, Image Processing, Radar Signal Processing etc. FFT is the most compute intensive and time consuming operation in most of the above applications. It is always a challenge to develop an architecture which gives high throughput while reducing the latency without much area overhead. Next generation wireless systems demand high transmission efficiency and hence FFT processor should be capable of doing computations much faster. Architectures based on smaller radices for computing longer FFTs are inefficient. In this thesis, a fully parallel unrolled FFT architecture based on novel radix-4 engine is proposed which is catered for wide range of applications. The radix-4 butterfly unit takes all four inputs in parallel and can selectively produce one out of the four outputs. The proposed architecture uses Radix-4^3 and Radix-4^4 algorithms for computation of various FFTs. The Radix-4^4 block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. In existing Cooley-Tukey architectures, the output from each stage has to be reordered before the next stage can start computation. This needs intermediate storage after each stage. In our architecture, each stage can directly generate the reordered outputs and hence reduce these buffers. A solution for output reordering problem in Radix-4^3 and Radix-4^4 FFT architectures are also discussed in this work. Although the hardware complexity in terms of adders and multipliers are increased in our architecture, a significant reduction in intermediate memory requirement is achieved. FFTs of varying sizes starting from 64 point to 64K point have been implemented in ASIC using UMC 130nm CMOS technology. The data representation used in this work is fixed point format and selected word length is 16 bits to get maximum Signal to Quantization Noise Ratio (SQNR). The architecture has been found to be more suitable for computing FFT of large sizes. For 4096 point and 64K point FFTs, this design gives comparable throughput with considerable reduction in area and latency when compared to the state-of-art implementations. The 64K point FFT architecture resulted in a throughput of 1332 mega samples per second with an area of 171.78 mm^2 and total power of 10.7W at 333 MHz.Nandy, S KJamadagni, H S2016-09-09T14:19:48Z2016-09-09T14:19:48Z2016-09-092012-12Thesishttp://etd.iisc.ernet.in/handle/2005/2557http://etd.ncsi.iisc.ernet.in/abstracts/3324/G25691-Abs.pdfen_USG25691
collection NDLTD
language en_US
sources NDLTD
topic Wireless Communication Systems
Fast Fourier Transformation Processor
Fast Fourier Transform Archirecture
Fast Fourier Transform - Algorithms
Application Specific Integrated Circuit
FFT Processor
FFT Architecture
Orthogonal Frequency Division Multiplexing (OFDM)
Communication Engineering
spellingShingle Wireless Communication Systems
Fast Fourier Transformation Processor
Fast Fourier Transform Archirecture
Fast Fourier Transform - Algorithms
Application Specific Integrated Circuit
FFT Processor
FFT Architecture
Orthogonal Frequency Division Multiplexing (OFDM)
Communication Engineering
Kala, S
ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
description The rapid advancements in semiconductor technology have led to constant shrinking of transistor sizes as per Moore's Law. Wireless communications is one field which has seen explosive growth, thanks to the cramming of more transistors into a single chip. Design of these systems involve trade-offs between performance, area and power. Fast Fourier Transform is an important component in most of the wireless communication systems. FFTs are widely used in applications like OFDM transceivers, Spectrum sensing in Cognitive Radio, Image Processing, Radar Signal Processing etc. FFT is the most compute intensive and time consuming operation in most of the above applications. It is always a challenge to develop an architecture which gives high throughput while reducing the latency without much area overhead. Next generation wireless systems demand high transmission efficiency and hence FFT processor should be capable of doing computations much faster. Architectures based on smaller radices for computing longer FFTs are inefficient. In this thesis, a fully parallel unrolled FFT architecture based on novel radix-4 engine is proposed which is catered for wide range of applications. The radix-4 butterfly unit takes all four inputs in parallel and can selectively produce one out of the four outputs. The proposed architecture uses Radix-4^3 and Radix-4^4 algorithms for computation of various FFTs. The Radix-4^4 block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. In existing Cooley-Tukey architectures, the output from each stage has to be reordered before the next stage can start computation. This needs intermediate storage after each stage. In our architecture, each stage can directly generate the reordered outputs and hence reduce these buffers. A solution for output reordering problem in Radix-4^3 and Radix-4^4 FFT architectures are also discussed in this work. Although the hardware complexity in terms of adders and multipliers are increased in our architecture, a significant reduction in intermediate memory requirement is achieved. FFTs of varying sizes starting from 64 point to 64K point have been implemented in ASIC using UMC 130nm CMOS technology. The data representation used in this work is fixed point format and selected word length is 16 bits to get maximum Signal to Quantization Noise Ratio (SQNR). The architecture has been found to be more suitable for computing FFT of large sizes. For 4096 point and 64K point FFTs, this design gives comparable throughput with considerable reduction in area and latency when compared to the state-of-art implementations. The 64K point FFT architecture resulted in a throughput of 1332 mega samples per second with an area of 171.78 mm^2 and total power of 10.7W at 333 MHz.
author2 Nandy, S K
author_facet Nandy, S K
Kala, S
author Kala, S
author_sort Kala, S
title ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_short ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_full ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_fullStr ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_full_unstemmed ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_sort asic implementation of a high throughput, low latency, memory optimized fft processor
publishDate 2016
url http://etd.iisc.ernet.in/handle/2005/2557
http://etd.ncsi.iisc.ernet.in/abstracts/3324/G25691-Abs.pdf
work_keys_str_mv AT kalas asicimplementationofahighthroughputlowlatencymemoryoptimizedfftprocessor
_version_ 1718603800564989952