Architecture Design of CAVLC Decoder with Low Power and High Throughput Considerations

碩士 === 國立中央大學 === 電機工程研究所 === 96 === The entropy decoder in MPEG-4 AVC/H.264 baseline standard adopts Content Adaptive Variable Length Decoder (CAVLD). Because of symbol-to-symbol dependency, a traditional CAVLC decoder consumes lots of clock cycles in decoding and brings down the performance. We di...

Full description

Bibliographic Details
Main Authors: Te-Lung Fang, 方得龍
Other Authors: Tsung-Han Tsai
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/22156681524343322753
Description
Summary:碩士 === 國立中央大學 === 電機工程研究所 === 96 === The entropy decoder in MPEG-4 AVC/H.264 baseline standard adopts Content Adaptive Variable Length Decoder (CAVLD). Because of symbol-to-symbol dependency, a traditional CAVLC decoder consumes lots of clock cycles in decoding and brings down the performance. We discover the decoding of two parameters spending almost eighty percent of computing time through profiling the computation of sub-modules and analyzing the encoding rules, which are non-zero coefficient (Level) and run_before. Thus this paper proposes a fast algorithm adapted for run_before decoder and the parallel architecture for level decoder, to improve the decoding performance. According to the features of these two methods, we name these two new methods as MLD (Multiple Level Decoding) and NZS (Non-Zero Skipping for run_before decoding). By performing parallel operation on level decoder, MLD can decode two levels in one cycle at most situations, and NZS can produce several values of run_before in the same cycle. These two methods have the advantages of low complexity and regularity design. According to the result of evaluation, our design needs least cycle time, 137 cycles in average, for one macroblock decoding. Moreover, the proposed CAVLC decoder can run at 33.5 MHz to meet the real time requirement for H.264 video decoding on 1920×1088 resolution. Compared with the previous designs, it can reduce around 29.1% to 71.5% on operation frequency for the same requirement, but even no increase on the gate count. With an aid on a lower operation frequency, it will be suitable for many low power applications. Our proposed design has been implemented and synthesized with TSMC 0.18um Standard Cell Library. The synthesis result shows that the gate count is 13189 gates with the clock constraint of 125 MHz, and the maximum frequency is up to 160 MHz.