Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings

博士 === 國立中興大學 === 電機工程學系所 === 97 === In this thesis, we present two topics, where one is the algorithms and the implementations of the integer transforms for the H.264/AVC, VC-1, and AVS applications, and the other is the algorithms and the implementations of the pruning FFT, the radix-24 FFT, and t...

Full description

Bibliographic Details
Main Authors: Guo-An Su, 蘇國安
Other Authors: 范志鵬
Format: Others
Language:en_US
Online Access:http://ndltd.ncl.edu.tw/handle/46351082493317984115
id ndltd-TW-097NCHU5441008
record_format oai_dc
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立中興大學 === 電機工程學系所 === 97 === In this thesis, we present two topics, where one is the algorithms and the implementations of the integer transforms for the H.264/AVC, VC-1, and AVS applications, and the other is the algorithms and the implementations of the pruning FFT, the radix-24 FFT, and the recursive DFT for the communication applications. In the algorithms and the implementations of the integer transforms, the fast algorithms and their hardware sharing architecture between the integer transforms in the different video standards are proposed to reduce the hardware costs. For the development of the integer transforms in the H.264/AVC video standard, the fast algorithms and the hardware sharing architectures of the 1-D forward/inverse integer transforms are proposed to achieve the low-cost hardware circuits when the hardware circuits of the 1-D forward and inverses of H.264/AVC need to be implemented in the same chip simultaneously. By the cell based design flow, the gate counts of the hardware sharing design for the 1-D 4×4 and 8×8 forward/inverse integer transforms are smaller than those of the individual 1-D 4×4 and 8×8 forward/inverse integer transforms without the hardware share. For the development of the integer transforms in the VC-1 video standard, the proposed fast algorithms for the 1-D 8×8 inverse integer transform of VC-1 are developed based on the matrix symmetric property and matrix decompositions. The numbers of the additions and the shift operations of the proposed fast 1-D inverse integer transforms are smaller than those of the previous fast method. For the hardware sharing design of the fast 1-D 4×4 and 8×8 forward/inverse integer transforms, the common hardware modules are shared to reduce the total hardware costs. Thus, the hardware costs of the proposed 1-D and 2-D hardware sharing design are smaller than those of the individual and separate designs without shares. For the hardware sharing design of H.264/AVC and AVS, the hardware sharing design for the 1-D 2×2, 4×4, and 8×8 inverse transforms of H.264/AVC and the 1-D 8×8 inverse transform of AVS is proposed with the low hardware cost, especially for the multiple decoding applications in China. By sharing the hardware, the proposed 1-D hardware sharing architecture is realized by adding the offset computations, and it is implemented with the pipelined architecture. Thus, the hardware costs of the proposed sharing architecture are smaller than those of the individual and separate designs. In the algorithms and implementations of Discrete Fourier Transform (DFT), the algorithms and architectures of the pruning FFT, the radix-24 FFT, and the recursive DFT are proposed. The proposed pruning FFT algorithm, which is developed by the grouped scheme, is applied to compute DFT with the power-of-two partial transform length. The group-based pruning FFT algorithm applies the scheme of the grouped frequency indices to accelerate DFT computations. The proposed pruning FFT algorithm has fewer complex multiplications than the other pruning FFT algorithms when the number of the partial transformed outputs is equal to or larger than the 1/16 total FFT transform length. Next, the efficient and low-cost 256-point FFT architecture and implementation are proposed. Based on the radix-16 FFT algorithm, the proposed 256-point FFT processor utilizes the simplified cascaded radix-24 single-path delay feedback (SDF) structure. Thus, the control circuit of the proposed simplified radix-24 FFT SDF architecture is simpler than that of the direct radix-16 FFT SDF structure. In the hardware verification, the throughput of our FFT design processes up to 35.5M samples/sec with a Xilinx Virtex2 1500 FPGA chip, and it processes up to 51.5M samples/sec with the UMC 0.18μm standard cell library. Finally, the efficient recursive algorithm for the DFT computations with the power-of-two transform length is proposed. The benefit of proposed recursive structure is the reduction of the loop numbers, and the signal to quantization noise ratio (SQNR) of the proposed recursive DFT is greater than that of the conventional Goertzel’s algorithm. The proposed recursive DFT is also modified by the selected coefficients to reduce the round-off error and increase SQNR.
author2 范志鵬
author_facet 范志鵬
Guo-An Su
蘇國安
author Guo-An Su
蘇國安
spellingShingle Guo-An Su
蘇國安
Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings
author_sort Guo-An Su
title Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings
title_short Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings
title_full Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings
title_fullStr Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings
title_full_unstemmed Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings
title_sort low-complexity discrete transforms and their architecture designs for video and communication signal processings
url http://ndltd.ncl.edu.tw/handle/46351082493317984115
work_keys_str_mv AT guoansu lowcomplexitydiscretetransformsandtheirarchitecturedesignsforvideoandcommunicationsignalprocessings
AT sūguóān lowcomplexitydiscretetransformsandtheirarchitecturedesignsforvideoandcommunicationsignalprocessings
AT guoansu yīngyòngyúshìxùnyǔtōngxùnxìnhàochùlǐdedīfùzádùlísànzhuǎnhuànjíqíjiàgòushèjì
AT sūguóān yīngyòngyúshìxùnyǔtōngxùnxìnhàochùlǐdedīfùzádùlísànzhuǎnhuànjíqíjiàgòushèjì
_version_ 1718252035952869376
spelling ndltd-TW-097NCHU54410082016-04-29T04:19:42Z http://ndltd.ncl.edu.tw/handle/46351082493317984115 Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings 應用於視訊與通訊信號處理的低複雜度離散轉換及其架構設計 Guo-An Su 蘇國安 博士 國立中興大學 電機工程學系所 97 In this thesis, we present two topics, where one is the algorithms and the implementations of the integer transforms for the H.264/AVC, VC-1, and AVS applications, and the other is the algorithms and the implementations of the pruning FFT, the radix-24 FFT, and the recursive DFT for the communication applications. In the algorithms and the implementations of the integer transforms, the fast algorithms and their hardware sharing architecture between the integer transforms in the different video standards are proposed to reduce the hardware costs. For the development of the integer transforms in the H.264/AVC video standard, the fast algorithms and the hardware sharing architectures of the 1-D forward/inverse integer transforms are proposed to achieve the low-cost hardware circuits when the hardware circuits of the 1-D forward and inverses of H.264/AVC need to be implemented in the same chip simultaneously. By the cell based design flow, the gate counts of the hardware sharing design for the 1-D 4×4 and 8×8 forward/inverse integer transforms are smaller than those of the individual 1-D 4×4 and 8×8 forward/inverse integer transforms without the hardware share. For the development of the integer transforms in the VC-1 video standard, the proposed fast algorithms for the 1-D 8×8 inverse integer transform of VC-1 are developed based on the matrix symmetric property and matrix decompositions. The numbers of the additions and the shift operations of the proposed fast 1-D inverse integer transforms are smaller than those of the previous fast method. For the hardware sharing design of the fast 1-D 4×4 and 8×8 forward/inverse integer transforms, the common hardware modules are shared to reduce the total hardware costs. Thus, the hardware costs of the proposed 1-D and 2-D hardware sharing design are smaller than those of the individual and separate designs without shares. For the hardware sharing design of H.264/AVC and AVS, the hardware sharing design for the 1-D 2×2, 4×4, and 8×8 inverse transforms of H.264/AVC and the 1-D 8×8 inverse transform of AVS is proposed with the low hardware cost, especially for the multiple decoding applications in China. By sharing the hardware, the proposed 1-D hardware sharing architecture is realized by adding the offset computations, and it is implemented with the pipelined architecture. Thus, the hardware costs of the proposed sharing architecture are smaller than those of the individual and separate designs. In the algorithms and implementations of Discrete Fourier Transform (DFT), the algorithms and architectures of the pruning FFT, the radix-24 FFT, and the recursive DFT are proposed. The proposed pruning FFT algorithm, which is developed by the grouped scheme, is applied to compute DFT with the power-of-two partial transform length. The group-based pruning FFT algorithm applies the scheme of the grouped frequency indices to accelerate DFT computations. The proposed pruning FFT algorithm has fewer complex multiplications than the other pruning FFT algorithms when the number of the partial transformed outputs is equal to or larger than the 1/16 total FFT transform length. Next, the efficient and low-cost 256-point FFT architecture and implementation are proposed. Based on the radix-16 FFT algorithm, the proposed 256-point FFT processor utilizes the simplified cascaded radix-24 single-path delay feedback (SDF) structure. Thus, the control circuit of the proposed simplified radix-24 FFT SDF architecture is simpler than that of the direct radix-16 FFT SDF structure. In the hardware verification, the throughput of our FFT design processes up to 35.5M samples/sec with a Xilinx Virtex2 1500 FPGA chip, and it processes up to 51.5M samples/sec with the UMC 0.18μm standard cell library. Finally, the efficient recursive algorithm for the DFT computations with the power-of-two transform length is proposed. The benefit of proposed recursive structure is the reduction of the loop numbers, and the signal to quantization noise ratio (SQNR) of the proposed recursive DFT is greater than that of the conventional Goertzel’s algorithm. The proposed recursive DFT is also modified by the selected coefficients to reduce the round-off error and increase SQNR. 范志鵬 學位論文 ; thesis 167 en_US