Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video
博士 === 國立交通大學 === 電子工程系所 === 96 === H.264 video standard has been widely adopted in high definition video applications because of its high compression efficiency and video quality. However, the major bottlenecks of H.264 implementation are its high computational loading and large memory bandwidth, e...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2008
|
Online Access: | http://ndltd.ncl.edu.tw/handle/21645773592067912988 |
id |
ndltd-TW-096NCTU5428115 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立交通大學 === 電子工程系所 === 96 === H.264 video standard has been widely adopted in high definition video applications because of its high compression efficiency and video quality. However, the major bottlenecks of H.264 implementation are its high computational loading and large memory bandwidth, especially for encoding 1920x1080 (1080p) high definition video in real time. Therefore, this dissertation proposes the first chip in academia which can both support H.264 high profile and encode 1080p video in real time.
This dissertation contains three parts. First, we discuss and analyze the inter prediction modules which occupy the most memory bandwidth and hardware cost in H.264 encoder. To overcome these problems, we present a low complexity and hardware efficient motion estimation design with several design techniques. The first low complexity technique, mode filtering, selects the best two candidates of all possible block size combinations for refinement, and reduces the computations of fractional refinement by 73.2%. To further reduce the complexity and hardware cost, we propose a multi-level parallel processing technique in integer motion estimation stage. By this technique, 91.7% of complexity and 30% of gate count can be reduced. Furthermore, 88% of local memory size and 46% of external memory bandwidth can be reduced by the level C data reuse technique. Finally, our proposed single iteration technique can remove 68% of gate count and double the throughput of fractional motion estimation stage, which is a bottleneck in the inter prediction modules. In summary, the proposed H.264 inter prediction engine not only can support 1080p resolution and ±128 search range but also can reduce 60% of hardware and 68.9% of internal SRAM than previous work.
The second part of the dissertation is the architecture design of H.264 intra encoder. The intra encoder in H.264 standard provides comparable coding efficiency with JPEG 2000 standards. To achieve high throughput and low area cost, we apply the modified three-step fast intra prediction to reduce the cycle count while keeping the quality as close as full search. Then, we further adopt the variable pixel parallelism to speed up performance on the critical intra prediction part while keeping other parts with low area cost. The achieved design supports 1080p video encoding and reduces 23.5% of gate count cost compared to the previous design. In addition, this design can achieve low power consumption by reducing 48% of operating frequency and several low power techniques.
The final part of this dissertation is a complete H.264 high profile encoder. Because several high definition applications apply H.264 high profile, we integrate our motion estimation engine, intra encoder, and the new coding tools of H.264 high profile into a complete H.264 high profile encoder supporting 1080p video. These 1080p high profile applications present a series of new design challenges in throughput, cost and power. Furthermore, in system level, a timing conflict happens in the reconstruction stage of inter and intra prediction due to the three pipelined stages architecture. Therefore, we first propose the crossing stage hardware sharing technique to remove the conflict and repeated hardware. To solve the high throughput demands and structural hazards, this design adopts full eight-pixel parallelism. In motion estimation part, the bi-directional motion estimation modules share the hardware, and the integer and fractional motion estimation modules also share the local SRAM to reduce the internal memory size and bandwidth. In summary, we propose the first H.264 high profile encoder in academia which supports 1080p resolution under only 145MHz. The core area is 3.17x3.17mm2 under 0.13μm process, which is only 54% of previous work. The power consumption is 242mW for 1080p resolution and is only 46.3% of previous work for 720p resolution. Therefore, the small area, low power, and high throughput design is suitable for high definition video applications.
|
author2 |
Tian-Sheuan Chang |
author_facet |
Tian-Sheuan Chang Yu-Kun Lin 林佑昆 |
author |
Yu-Kun Lin 林佑昆 |
spellingShingle |
Yu-Kun Lin 林佑昆 Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video |
author_sort |
Yu-Kun Lin |
title |
Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video |
title_short |
Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video |
title_full |
Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video |
title_fullStr |
Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video |
title_full_unstemmed |
Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video |
title_sort |
design of h.264/mpeg-4 avc video encoder for high definition video |
publishDate |
2008 |
url |
http://ndltd.ncl.edu.tw/handle/21645773592067912988 |
work_keys_str_mv |
AT yukunlin designofh264mpeg4avcvideoencoderforhighdefinitionvideo AT línyòukūn designofh264mpeg4avcvideoencoderforhighdefinitionvideo AT yukunlin zhēnduìgāohuàzhìshìxùnzhīh264mpeg4avcshìxùnbiānmǎqìshèjì AT línyòukūn zhēnduìgāohuàzhìshìxùnzhīh264mpeg4avcshìxùnbiānmǎqìshèjì |
_version_ |
1717744536138022912 |
spelling |
ndltd-TW-096NCTU54281152015-10-13T13:51:49Z http://ndltd.ncl.edu.tw/handle/21645773592067912988 Design of H.264/MPEG-4 AVC Video Encoder for High Definition Video 針對高畫質視訊之H.264/MPEG-4AVC視訊編碼器設計 Yu-Kun Lin 林佑昆 博士 國立交通大學 電子工程系所 96 H.264 video standard has been widely adopted in high definition video applications because of its high compression efficiency and video quality. However, the major bottlenecks of H.264 implementation are its high computational loading and large memory bandwidth, especially for encoding 1920x1080 (1080p) high definition video in real time. Therefore, this dissertation proposes the first chip in academia which can both support H.264 high profile and encode 1080p video in real time. This dissertation contains three parts. First, we discuss and analyze the inter prediction modules which occupy the most memory bandwidth and hardware cost in H.264 encoder. To overcome these problems, we present a low complexity and hardware efficient motion estimation design with several design techniques. The first low complexity technique, mode filtering, selects the best two candidates of all possible block size combinations for refinement, and reduces the computations of fractional refinement by 73.2%. To further reduce the complexity and hardware cost, we propose a multi-level parallel processing technique in integer motion estimation stage. By this technique, 91.7% of complexity and 30% of gate count can be reduced. Furthermore, 88% of local memory size and 46% of external memory bandwidth can be reduced by the level C data reuse technique. Finally, our proposed single iteration technique can remove 68% of gate count and double the throughput of fractional motion estimation stage, which is a bottleneck in the inter prediction modules. In summary, the proposed H.264 inter prediction engine not only can support 1080p resolution and ±128 search range but also can reduce 60% of hardware and 68.9% of internal SRAM than previous work. The second part of the dissertation is the architecture design of H.264 intra encoder. The intra encoder in H.264 standard provides comparable coding efficiency with JPEG 2000 standards. To achieve high throughput and low area cost, we apply the modified three-step fast intra prediction to reduce the cycle count while keeping the quality as close as full search. Then, we further adopt the variable pixel parallelism to speed up performance on the critical intra prediction part while keeping other parts with low area cost. The achieved design supports 1080p video encoding and reduces 23.5% of gate count cost compared to the previous design. In addition, this design can achieve low power consumption by reducing 48% of operating frequency and several low power techniques. The final part of this dissertation is a complete H.264 high profile encoder. Because several high definition applications apply H.264 high profile, we integrate our motion estimation engine, intra encoder, and the new coding tools of H.264 high profile into a complete H.264 high profile encoder supporting 1080p video. These 1080p high profile applications present a series of new design challenges in throughput, cost and power. Furthermore, in system level, a timing conflict happens in the reconstruction stage of inter and intra prediction due to the three pipelined stages architecture. Therefore, we first propose the crossing stage hardware sharing technique to remove the conflict and repeated hardware. To solve the high throughput demands and structural hazards, this design adopts full eight-pixel parallelism. In motion estimation part, the bi-directional motion estimation modules share the hardware, and the integer and fractional motion estimation modules also share the local SRAM to reduce the internal memory size and bandwidth. In summary, we propose the first H.264 high profile encoder in academia which supports 1080p resolution under only 145MHz. The core area is 3.17x3.17mm2 under 0.13μm process, which is only 54% of previous work. The power consumption is 242mW for 1080p resolution and is only 46.3% of previous work for 720p resolution. Therefore, the small area, low power, and high throughput design is suitable for high definition video applications. Tian-Sheuan Chang Chein-Wei Jen 張添烜 任建葳 2008 學位論文 ; thesis 155 en_US |