Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings
博士 === 國立中興大學 === 電機工程學系所 === 97 === In this thesis, we present two topics, where one is the algorithms and the implementations of the integer transforms for the H.264/AVC, VC-1, and AVS applications, and the other is the algorithms and the implementations of the pruning FFT, the radix-24 FFT, and t...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Online Access: | http://ndltd.ncl.edu.tw/handle/46351082493317984115 |
id |
ndltd-TW-097NCHU5441008 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立中興大學 === 電機工程學系所 === 97 === In this thesis, we present two topics, where one is the algorithms and the implementations of the integer transforms for the H.264/AVC, VC-1, and AVS applications, and the other is the algorithms and the implementations of the pruning FFT, the radix-24 FFT, and the recursive DFT for the communication applications.
In the algorithms and the implementations of the integer transforms, the fast algorithms and their hardware sharing architecture between the integer transforms in the different video standards are proposed to reduce the hardware costs. For the development of the integer transforms in the H.264/AVC video standard, the fast algorithms and the hardware sharing architectures of the 1-D forward/inverse integer transforms are proposed to achieve the low-cost hardware circuits when the hardware circuits of the 1-D forward and inverses of H.264/AVC need to be implemented in the same chip simultaneously. By the cell based design flow, the gate counts of the hardware sharing design for the 1-D 4×4 and 8×8 forward/inverse integer transforms are smaller than those of the individual 1-D 4×4 and 8×8 forward/inverse integer transforms without the hardware share.
For the development of the integer transforms in the VC-1 video standard, the proposed fast algorithms for the 1-D 8×8 inverse integer transform of VC-1 are developed based on the matrix symmetric property and matrix decompositions. The numbers of the additions and the shift operations of the proposed fast 1-D inverse integer transforms are smaller than those of the previous fast method. For the hardware sharing design of the fast 1-D 4×4 and 8×8 forward/inverse integer transforms, the common hardware modules are shared to reduce the total hardware costs. Thus, the hardware costs of the proposed 1-D and 2-D hardware sharing design are smaller than those of the individual and separate designs without shares.
For the hardware sharing design of H.264/AVC and AVS, the hardware sharing design for the 1-D 2×2, 4×4, and 8×8 inverse transforms of H.264/AVC and the 1-D 8×8 inverse transform of AVS is proposed with the low hardware cost, especially for the multiple decoding applications in China. By sharing the hardware, the proposed 1-D hardware sharing architecture is realized by adding the offset computations, and it is implemented with the pipelined architecture. Thus, the hardware costs of the proposed sharing architecture are smaller than those of the individual and separate designs.
In the algorithms and implementations of Discrete Fourier Transform (DFT), the algorithms and architectures of the pruning FFT, the radix-24 FFT, and the recursive DFT are proposed. The proposed pruning FFT algorithm, which is developed by the grouped scheme, is applied to compute DFT with the power-of-two partial transform length. The group-based pruning FFT algorithm applies the scheme of the grouped frequency indices to accelerate DFT computations. The proposed pruning FFT algorithm has fewer complex multiplications than the other pruning FFT algorithms when the number of the partial transformed outputs is equal to or larger than the 1/16 total FFT transform length.
Next, the efficient and low-cost 256-point FFT architecture and implementation are proposed. Based on the radix-16 FFT algorithm, the proposed 256-point FFT processor utilizes the simplified cascaded radix-24 single-path delay feedback (SDF) structure. Thus, the control circuit of the proposed simplified radix-24 FFT SDF architecture is simpler than that of the direct radix-16 FFT SDF structure. In the hardware verification, the throughput of our FFT design processes up to 35.5M samples/sec with a Xilinx Virtex2 1500 FPGA chip, and it processes up to 51.5M samples/sec with the UMC 0.18μm standard cell library.
Finally, the efficient recursive algorithm for the DFT computations with the power-of-two transform length is proposed. The benefit of proposed recursive structure is the reduction of the loop numbers, and the signal to quantization noise ratio (SQNR) of the proposed recursive DFT is greater than that of the conventional Goertzel’s algorithm. The proposed recursive DFT is also modified by the selected coefficients to reduce the round-off error and increase SQNR.
|
author2 |
范志鵬 |
author_facet |
范志鵬 Guo-An Su 蘇國安 |
author |
Guo-An Su 蘇國安 |
spellingShingle |
Guo-An Su 蘇國安 Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings |
author_sort |
Guo-An Su |
title |
Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings |
title_short |
Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings |
title_full |
Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings |
title_fullStr |
Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings |
title_full_unstemmed |
Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings |
title_sort |
low-complexity discrete transforms and their architecture designs for video and communication signal processings |
url |
http://ndltd.ncl.edu.tw/handle/46351082493317984115 |
work_keys_str_mv |
AT guoansu lowcomplexitydiscretetransformsandtheirarchitecturedesignsforvideoandcommunicationsignalprocessings AT sūguóān lowcomplexitydiscretetransformsandtheirarchitecturedesignsforvideoandcommunicationsignalprocessings AT guoansu yīngyòngyúshìxùnyǔtōngxùnxìnhàochùlǐdedīfùzádùlísànzhuǎnhuànjíqíjiàgòushèjì AT sūguóān yīngyòngyúshìxùnyǔtōngxùnxìnhàochùlǐdedīfùzádùlísànzhuǎnhuànjíqíjiàgòushèjì |
_version_ |
1718252035952869376 |
spelling |
ndltd-TW-097NCHU54410082016-04-29T04:19:42Z http://ndltd.ncl.edu.tw/handle/46351082493317984115 Low-Complexity Discrete Transforms and Their Architecture Designs for Video and Communication Signal Processings 應用於視訊與通訊信號處理的低複雜度離散轉換及其架構設計 Guo-An Su 蘇國安 博士 國立中興大學 電機工程學系所 97 In this thesis, we present two topics, where one is the algorithms and the implementations of the integer transforms for the H.264/AVC, VC-1, and AVS applications, and the other is the algorithms and the implementations of the pruning FFT, the radix-24 FFT, and the recursive DFT for the communication applications. In the algorithms and the implementations of the integer transforms, the fast algorithms and their hardware sharing architecture between the integer transforms in the different video standards are proposed to reduce the hardware costs. For the development of the integer transforms in the H.264/AVC video standard, the fast algorithms and the hardware sharing architectures of the 1-D forward/inverse integer transforms are proposed to achieve the low-cost hardware circuits when the hardware circuits of the 1-D forward and inverses of H.264/AVC need to be implemented in the same chip simultaneously. By the cell based design flow, the gate counts of the hardware sharing design for the 1-D 4×4 and 8×8 forward/inverse integer transforms are smaller than those of the individual 1-D 4×4 and 8×8 forward/inverse integer transforms without the hardware share. For the development of the integer transforms in the VC-1 video standard, the proposed fast algorithms for the 1-D 8×8 inverse integer transform of VC-1 are developed based on the matrix symmetric property and matrix decompositions. The numbers of the additions and the shift operations of the proposed fast 1-D inverse integer transforms are smaller than those of the previous fast method. For the hardware sharing design of the fast 1-D 4×4 and 8×8 forward/inverse integer transforms, the common hardware modules are shared to reduce the total hardware costs. Thus, the hardware costs of the proposed 1-D and 2-D hardware sharing design are smaller than those of the individual and separate designs without shares. For the hardware sharing design of H.264/AVC and AVS, the hardware sharing design for the 1-D 2×2, 4×4, and 8×8 inverse transforms of H.264/AVC and the 1-D 8×8 inverse transform of AVS is proposed with the low hardware cost, especially for the multiple decoding applications in China. By sharing the hardware, the proposed 1-D hardware sharing architecture is realized by adding the offset computations, and it is implemented with the pipelined architecture. Thus, the hardware costs of the proposed sharing architecture are smaller than those of the individual and separate designs. In the algorithms and implementations of Discrete Fourier Transform (DFT), the algorithms and architectures of the pruning FFT, the radix-24 FFT, and the recursive DFT are proposed. The proposed pruning FFT algorithm, which is developed by the grouped scheme, is applied to compute DFT with the power-of-two partial transform length. The group-based pruning FFT algorithm applies the scheme of the grouped frequency indices to accelerate DFT computations. The proposed pruning FFT algorithm has fewer complex multiplications than the other pruning FFT algorithms when the number of the partial transformed outputs is equal to or larger than the 1/16 total FFT transform length. Next, the efficient and low-cost 256-point FFT architecture and implementation are proposed. Based on the radix-16 FFT algorithm, the proposed 256-point FFT processor utilizes the simplified cascaded radix-24 single-path delay feedback (SDF) structure. Thus, the control circuit of the proposed simplified radix-24 FFT SDF architecture is simpler than that of the direct radix-16 FFT SDF structure. In the hardware verification, the throughput of our FFT design processes up to 35.5M samples/sec with a Xilinx Virtex2 1500 FPGA chip, and it processes up to 51.5M samples/sec with the UMC 0.18μm standard cell library. Finally, the efficient recursive algorithm for the DFT computations with the power-of-two transform length is proposed. The benefit of proposed recursive structure is the reduction of the loop numbers, and the signal to quantization noise ratio (SQNR) of the proposed recursive DFT is greater than that of the conventional Goertzel’s algorithm. The proposed recursive DFT is also modified by the selected coefficients to reduce the round-off error and increase SQNR. 范志鵬 學位論文 ; thesis 167 en_US |