Analysis and Architecture Design of Efficient Motion Estimations

碩士 === 國立中興大學 === 電機工程學系所 === 96 === Motion estimation is the most important part in video coding systems. It demands the most computing power and memory access in a video encoder. Among them, H.264/AVC is the latest international video coding standard. It can save 37%, 48%, and 64% of bitrates in c...

Full description

Bibliographic Details
Main Authors: Sheng-Yu Huang, 黃聖瑜
Other Authors: Yeong-Kang Lai
Format: Others
Language:zh-TW
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/09084446345417673603
id ndltd-TW-096NCHU5441089
record_format oai_dc
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中興大學 === 電機工程學系所 === 96 === Motion estimation is the most important part in video coding systems. It demands the most computing power and memory access in a video encoder. Among them, H.264/AVC is the latest international video coding standard. It can save 37%, 48%, and 64% of bitrates in comparison with MPEG-4, H.263, and MPEG-2, respectively. In the first part of this thesis, we introduce main motion estimation algorithms and architectures during the last two decades (1981-2006). Secondly, we proposed an application to the video qualities of high performance and high resolution motion estimator. This architecture is a scalable two-dimensional pipelined motion estimation processor for full search block matching algorithm (FSBMA). By scalable and pipeline technology, this architecture can be scaled up or down to meet the performance requirements. The proposed 2-D motion estimator can perform the block-matching operations of the consecutive frames smoothly without any processing element (PE) idle time at frame boundaries. Furthermore, it reduced the external memory bandwidth with level C+ data reuse. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The chip implementation results show that the performance of the proposed architecture is high for FSBMA. It can work at 100 MHz and its power consumption is about 364.06 mW. And its chip size is 3.24 × 3.24 mm2. Thirdly, our proposed fast algorithm can avoid trapping into the local minimum based on pixel subsampling algorithm. While preserving the same quality as FSBMA, our algorithm complexity is about 7.5% of FSBMA one. Thus, the power consumption of our proposed motion estimator is low. It is composed of a 4 × 4 PE array, a parallel sum of absolute differences (SAD) tree, and a parallel comparator tree. The hardware cost is low since the datapath can be reused during the operations of these two steps. In order to reduce the system memory bandwidth, the memory interleaving organization and local memory configuration are proposed to easily arrange the current and reference pixel, and it may achieve the Level C data reuse scheme. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The proposed architecture can process SDTV (720 × 480) resolution pictures in 30 frames per second at 52 MHz. The chip implementation results show that the proposed architecture is 15 times more area-speed efficient than full search architectures. It can work at 52.4 MHz, and its power consumption is about 43.38 mW. And its chip size is 2.3 × 1.7 mm2. Finally, we proposed a fast motion estimation algorithm based on coarse-to-fine technique. We applied it to integer motion estimator of H.264 encoder. In the first stage, we adopt global elimination and downsampling algorithm to reduce computational complexity. In the second stage, we perform local full search on pixels around the selected candidates to obtain the 41 MVs. While preserving the same quality as FS, our algorithm complexity is about 5% of the variable block size (VBS) full search. In order to achieve H.264 encoder specification, we adopt parallel processing techniques. The corresponding coarse-to-fine architecture is composed of the pixel sum array to extract coarse features, the parallel SAD tree to perform matching operations, and the parallel comparator tree of four banks to find the each potential candidate. In order to reduce the system memory bandwidth, the memory interleaving organization and local memory configuration are also proposed to arrange the current and reference pixel, and achieve the Level C data reuse scheme. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The proposed architecture can process HD720p (1280 × 720) resolution pictures in 30 frames per second at 59.6 MHz. The chip implementation results show that proposed architecture is 20 times of VBS full search architectures according to area-speed product. It can work at 62.5 MHz and power consumption is about 183.0 mW. The chip size is 3.20 × 3.58 mm2. In conclusion, the contributions of the thesis mainly focus on three directions. Firstly, the scalable two-dimensional pipelined motion estimator has high performance and low external memory bandwidth. Secondly, the motion estimator based on fast pixel subsampling algorithm can reduce the maximum computational complexity and hardware cost with minimum video quality distortion. Thirdly, the motion estimator based on coarse-to-fine fast algorithm can achieve the highest performance among all integer motion estimation architectures and can support HD720p video format in H.264/AVC application. This architecture has low power consumption and low hardware cost, and it can almost display the same video quality as full search. We sincerely hope that our research results can make progress for the video technology.
author2 Yeong-Kang Lai
author_facet Yeong-Kang Lai
Sheng-Yu Huang
黃聖瑜
author Sheng-Yu Huang
黃聖瑜
spellingShingle Sheng-Yu Huang
黃聖瑜
Analysis and Architecture Design of Efficient Motion Estimations
author_sort Sheng-Yu Huang
title Analysis and Architecture Design of Efficient Motion Estimations
title_short Analysis and Architecture Design of Efficient Motion Estimations
title_full Analysis and Architecture Design of Efficient Motion Estimations
title_fullStr Analysis and Architecture Design of Efficient Motion Estimations
title_full_unstemmed Analysis and Architecture Design of Efficient Motion Estimations
title_sort analysis and architecture design of efficient motion estimations
publishDate 2008
url http://ndltd.ncl.edu.tw/handle/09084446345417673603
work_keys_str_mv AT shengyuhuang analysisandarchitecturedesignofefficientmotionestimations
AT huángshèngyú analysisandarchitecturedesignofefficientmotionestimations
AT shengyuhuang yídònggūsuànyǎnsuànfǎyánjiūyǔjíqídiànlùjiàgòushèjìyǔshíxiàn
AT huángshèngyú yídònggūsuànyǎnsuànfǎyánjiūyǔjíqídiànlùjiàgòushèjìyǔshíxiàn
_version_ 1718262971905343488
spelling ndltd-TW-096NCHU54410892016-05-09T04:13:48Z http://ndltd.ncl.edu.tw/handle/09084446345417673603 Analysis and Architecture Design of Efficient Motion Estimations 移動估算演算法研究與及其電路架構設計與實現 Sheng-Yu Huang 黃聖瑜 碩士 國立中興大學 電機工程學系所 96 Motion estimation is the most important part in video coding systems. It demands the most computing power and memory access in a video encoder. Among them, H.264/AVC is the latest international video coding standard. It can save 37%, 48%, and 64% of bitrates in comparison with MPEG-4, H.263, and MPEG-2, respectively. In the first part of this thesis, we introduce main motion estimation algorithms and architectures during the last two decades (1981-2006). Secondly, we proposed an application to the video qualities of high performance and high resolution motion estimator. This architecture is a scalable two-dimensional pipelined motion estimation processor for full search block matching algorithm (FSBMA). By scalable and pipeline technology, this architecture can be scaled up or down to meet the performance requirements. The proposed 2-D motion estimator can perform the block-matching operations of the consecutive frames smoothly without any processing element (PE) idle time at frame boundaries. Furthermore, it reduced the external memory bandwidth with level C+ data reuse. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The chip implementation results show that the performance of the proposed architecture is high for FSBMA. It can work at 100 MHz and its power consumption is about 364.06 mW. And its chip size is 3.24 × 3.24 mm2. Thirdly, our proposed fast algorithm can avoid trapping into the local minimum based on pixel subsampling algorithm. While preserving the same quality as FSBMA, our algorithm complexity is about 7.5% of FSBMA one. Thus, the power consumption of our proposed motion estimator is low. It is composed of a 4 × 4 PE array, a parallel sum of absolute differences (SAD) tree, and a parallel comparator tree. The hardware cost is low since the datapath can be reused during the operations of these two steps. In order to reduce the system memory bandwidth, the memory interleaving organization and local memory configuration are proposed to easily arrange the current and reference pixel, and it may achieve the Level C data reuse scheme. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The proposed architecture can process SDTV (720 × 480) resolution pictures in 30 frames per second at 52 MHz. The chip implementation results show that the proposed architecture is 15 times more area-speed efficient than full search architectures. It can work at 52.4 MHz, and its power consumption is about 43.38 mW. And its chip size is 2.3 × 1.7 mm2. Finally, we proposed a fast motion estimation algorithm based on coarse-to-fine technique. We applied it to integer motion estimator of H.264 encoder. In the first stage, we adopt global elimination and downsampling algorithm to reduce computational complexity. In the second stage, we perform local full search on pixels around the selected candidates to obtain the 41 MVs. While preserving the same quality as FS, our algorithm complexity is about 5% of the variable block size (VBS) full search. In order to achieve H.264 encoder specification, we adopt parallel processing techniques. The corresponding coarse-to-fine architecture is composed of the pixel sum array to extract coarse features, the parallel SAD tree to perform matching operations, and the parallel comparator tree of four banks to find the each potential candidate. In order to reduce the system memory bandwidth, the memory interleaving organization and local memory configuration are also proposed to arrange the current and reference pixel, and achieve the Level C data reuse scheme. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The proposed architecture can process HD720p (1280 × 720) resolution pictures in 30 frames per second at 59.6 MHz. The chip implementation results show that proposed architecture is 20 times of VBS full search architectures according to area-speed product. It can work at 62.5 MHz and power consumption is about 183.0 mW. The chip size is 3.20 × 3.58 mm2. In conclusion, the contributions of the thesis mainly focus on three directions. Firstly, the scalable two-dimensional pipelined motion estimator has high performance and low external memory bandwidth. Secondly, the motion estimator based on fast pixel subsampling algorithm can reduce the maximum computational complexity and hardware cost with minimum video quality distortion. Thirdly, the motion estimator based on coarse-to-fine fast algorithm can achieve the highest performance among all integer motion estimation architectures and can support HD720p video format in H.264/AVC application. This architecture has low power consumption and low hardware cost, and it can almost display the same video quality as full search. We sincerely hope that our research results can make progress for the video technology. Yeong-Kang Lai 賴永康 2008 學位論文 ; thesis 106 zh-TW