Summary: | 碩士 === 國立中興大學 === 電機工程學系所 === 96 === Motion estimation is the most important part in video coding systems. It demands the most computing power and memory access in a video encoder. Among them, H.264/AVC is the latest international video coding standard. It can save 37%, 48%, and 64% of bitrates in comparison with MPEG-4, H.263, and MPEG-2, respectively. In the first part of this thesis, we introduce main motion estimation algorithms and architectures during the last two decades (1981-2006).
Secondly, we proposed an application to the video qualities of high performance and high resolution motion estimator. This architecture is a scalable two-dimensional pipelined motion estimation processor for full search block matching algorithm (FSBMA). By scalable and pipeline technology, this architecture can be scaled up or down to meet the performance requirements. The proposed 2-D motion estimator can perform the block-matching operations of the consecutive frames smoothly without any processing element (PE) idle time at frame boundaries. Furthermore, it reduced the external memory bandwidth with level C+ data reuse. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The chip implementation results show that the performance of the proposed architecture is high for FSBMA. It can work at 100 MHz and its power consumption is about 364.06 mW. And its chip size is 3.24 × 3.24 mm2.
Thirdly, our proposed fast algorithm can avoid trapping into the local minimum based on pixel subsampling algorithm. While preserving the same quality as FSBMA, our algorithm complexity is about 7.5% of FSBMA one. Thus, the power consumption of our proposed motion estimator is low. It is composed of a 4 × 4 PE array, a parallel sum of absolute differences (SAD) tree, and a parallel comparator tree. The hardware cost is low since the datapath can be reused during the operations of these two steps. In order to reduce the system memory bandwidth, the memory interleaving organization and local memory configuration are proposed to easily arrange the current and reference pixel, and it may achieve the Level C data reuse scheme. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The proposed architecture can process SDTV (720 × 480) resolution pictures in 30 frames per second at 52 MHz. The chip implementation results show that the proposed architecture is 15 times more area-speed efficient than full search architectures. It can work at 52.4 MHz, and its power consumption is about 43.38 mW. And its chip size is 2.3 × 1.7 mm2.
Finally, we proposed a fast motion estimation algorithm based on coarse-to-fine technique. We applied it to integer motion estimator of H.264 encoder. In the first stage, we adopt global elimination and downsampling algorithm to reduce computational complexity. In the second stage, we perform local full search on pixels around the selected candidates to obtain the 41 MVs. While preserving the same quality as FS, our algorithm complexity is about 5% of the variable block size (VBS) full search. In order to achieve H.264 encoder specification, we adopt parallel processing techniques. The corresponding coarse-to-fine architecture is composed of the pixel sum array to extract coarse features, the parallel SAD tree to perform matching operations, and the parallel comparator tree of four banks to find the each potential candidate. In order to reduce the system memory bandwidth, the memory interleaving organization and local memory configuration are also proposed to arrange the current and reference pixel, and achieve the Level C data reuse scheme. The proposed architecture has been implemented using standard cell methodology for TSMC 0.18um 1P6M technology. The proposed architecture can process HD720p (1280 × 720) resolution pictures in 30 frames per second at 59.6 MHz. The chip implementation results show that proposed architecture is 20 times of VBS full search architectures according to area-speed product. It can work at 62.5 MHz and power consumption is about 183.0 mW. The chip size is 3.20 × 3.58 mm2.
In conclusion, the contributions of the thesis mainly focus on three directions. Firstly, the scalable two-dimensional pipelined motion estimator has high performance and low external memory bandwidth. Secondly, the motion estimator based on fast pixel subsampling algorithm can reduce the maximum computational complexity and hardware cost with minimum video quality distortion. Thirdly, the motion estimator based on coarse-to-fine fast algorithm can achieve the highest performance among all integer motion estimation architectures and can support HD720p video format in H.264/AVC application. This architecture has low power consumption and low hardware cost, and it can almost display the same video quality as full search. We sincerely hope that our research results can make progress for the video technology.
|