ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor

The rapid advancements in semiconductor technology have led to constant shrinking of transistor sizes as per Moore's Law. Wireless communications is one field which has seen explosive growth, thanks to the cramming of more transistors into a single chip. Design of these systems involve trade-of...

Full description

Bibliographic Details
Main Author:	Kala, S
Other Authors:	Nandy, S K
Language:	en_US
Published:	2016
Subjects:	Wireless Communication Systems Fast Fourier Transformation Processor Fast Fourier Transform Archirecture Fast Fourier Transform - Algorithms Application Specific Integrated Circuit FFT Processor FFT Architecture Orthogonal Frequency Division Multiplexing (OFDM) Communication Engineering
Online Access:	http://etd.iisc.ernet.in/handle/2005/2557 http://etd.ncsi.iisc.ernet.in/abstracts/3324/G25691-Abs.pdf

id	ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-2557
record_format	oai_dc
spelling	ndltd-IISc-oai-etd.ncsi.iisc.ernet.in-2005-25572018-01-10T03:36:49ZASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT ProcessorKala, SWireless Communication SystemsFast Fourier Transformation ProcessorFast Fourier Transform ArchirectureFast Fourier Transform - AlgorithmsApplication Specific Integrated CircuitFFT ProcessorFFT ArchitectureOrthogonal Frequency Division Multiplexing (OFDM)Communication EngineeringThe rapid advancements in semiconductor technology have led to constant shrinking of transistor sizes as per Moore's Law. Wireless communications is one field which has seen explosive growth, thanks to the cramming of more transistors into a single chip. Design of these systems involve trade-offs between performance, area and power. Fast Fourier Transform is an important component in most of the wireless communication systems. FFTs are widely used in applications like OFDM transceivers, Spectrum sensing in Cognitive Radio, Image Processing, Radar Signal Processing etc. FFT is the most compute intensive and time consuming operation in most of the above applications. It is always a challenge to develop an architecture which gives high throughput while reducing the latency without much area overhead. Next generation wireless systems demand high transmission efficiency and hence FFT processor should be capable of doing computations much faster. Architectures based on smaller radices for computing longer FFTs are inefficient. In this thesis, a fully parallel unrolled FFT architecture based on novel radix-4 engine is proposed which is catered for wide range of applications. The radix-4 butterfly unit takes all four inputs in parallel and can selectively produce one out of the four outputs. The proposed architecture uses Radix-4^3 and Radix-4^4 algorithms for computation of various FFTs. The Radix-4^4 block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. In existing Cooley-Tukey architectures, the output from each stage has to be reordered before the next stage can start computation. This needs intermediate storage after each stage. In our architecture, each stage can directly generate the reordered outputs and hence reduce these buffers. A solution for output reordering problem in Radix-4^3 and Radix-4^4 FFT architectures are also discussed in this work. Although the hardware complexity in terms of adders and multipliers are increased in our architecture, a significant reduction in intermediate memory requirement is achieved. FFTs of varying sizes starting from 64 point to 64K point have been implemented in ASIC using UMC 130nm CMOS technology. The data representation used in this work is fixed point format and selected word length is 16 bits to get maximum Signal to Quantization Noise Ratio (SQNR). The architecture has been found to be more suitable for computing FFT of large sizes. For 4096 point and 64K point FFTs, this design gives comparable throughput with considerable reduction in area and latency when compared to the state-of-art implementations. The 64K point FFT architecture resulted in a throughput of 1332 mega samples per second with an area of 171.78 mm^2 and total power of 10.7W at 333 MHz.Nandy, S KJamadagni, H S2016-09-09T14:19:48Z2016-09-09T14:19:48Z2016-09-092012-12Thesishttp://etd.iisc.ernet.in/handle/2005/2557http://etd.ncsi.iisc.ernet.in/abstracts/3324/G25691-Abs.pdfen_USG25691
collection	NDLTD
language	en_US
sources	NDLTD
topic	Wireless Communication Systems Fast Fourier Transformation Processor Fast Fourier Transform Archirecture Fast Fourier Transform - Algorithms Application Specific Integrated Circuit FFT Processor FFT Architecture Orthogonal Frequency Division Multiplexing (OFDM) Communication Engineering
spellingShingle	Wireless Communication Systems Fast Fourier Transformation Processor Fast Fourier Transform Archirecture Fast Fourier Transform - Algorithms Application Specific Integrated Circuit FFT Processor FFT Architecture Orthogonal Frequency Division Multiplexing (OFDM) Communication Engineering Kala, S ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
description	The rapid advancements in semiconductor technology have led to constant shrinking of transistor sizes as per Moore's Law. Wireless communications is one field which has seen explosive growth, thanks to the cramming of more transistors into a single chip. Design of these systems involve trade-offs between performance, area and power. Fast Fourier Transform is an important component in most of the wireless communication systems. FFTs are widely used in applications like OFDM transceivers, Spectrum sensing in Cognitive Radio, Image Processing, Radar Signal Processing etc. FFT is the most compute intensive and time consuming operation in most of the above applications. It is always a challenge to develop an architecture which gives high throughput while reducing the latency without much area overhead. Next generation wireless systems demand high transmission efficiency and hence FFT processor should be capable of doing computations much faster. Architectures based on smaller radices for computing longer FFTs are inefficient. In this thesis, a fully parallel unrolled FFT architecture based on novel radix-4 engine is proposed which is catered for wide range of applications. The radix-4 butterfly unit takes all four inputs in parallel and can selectively produce one out of the four outputs. The proposed architecture uses Radix-4^3 and Radix-4^4 algorithms for computation of various FFTs. The Radix-4^4 block can take all 256 inputs in parallel and can use the select control signals to generate one out of the 256 outputs. In existing Cooley-Tukey architectures, the output from each stage has to be reordered before the next stage can start computation. This needs intermediate storage after each stage. In our architecture, each stage can directly generate the reordered outputs and hence reduce these buffers. A solution for output reordering problem in Radix-4^3 and Radix-4^4 FFT architectures are also discussed in this work. Although the hardware complexity in terms of adders and multipliers are increased in our architecture, a significant reduction in intermediate memory requirement is achieved. FFTs of varying sizes starting from 64 point to 64K point have been implemented in ASIC using UMC 130nm CMOS technology. The data representation used in this work is fixed point format and selected word length is 16 bits to get maximum Signal to Quantization Noise Ratio (SQNR). The architecture has been found to be more suitable for computing FFT of large sizes. For 4096 point and 64K point FFTs, this design gives comparable throughput with considerable reduction in area and latency when compared to the state-of-art implementations. The 64K point FFT architecture resulted in a throughput of 1332 mega samples per second with an area of 171.78 mm^2 and total power of 10.7W at 333 MHz.
author2	Nandy, S K
author_facet	Nandy, S K Kala, S
author	Kala, S
author_sort	Kala, S
title	ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_short	ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_full	ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_fullStr	ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_full_unstemmed	ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor
title_sort	asic implementation of a high throughput, low latency, memory optimized fft processor
publishDate	2016
url	http://etd.iisc.ernet.in/handle/2005/2557 http://etd.ncsi.iisc.ernet.in/abstracts/3324/G25691-Abs.pdf
work_keys_str_mv	AT kalas asicimplementationofahighthroughputlowlatencymemoryoptimizedfftprocessor
_version_	1718603800564989952

ASIC Implementation of A High Throughput, Low Latency, Memory Optimized FFT Processor

Similar Items