Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques

博士 === 國立中央大學 === 電機工程研究所 === 99 === 　Computer vision has become an important research field in recent years. Many computer vision-based algorithms are proposed to design various novel systems. Theses systems could be further implemented and realized on an embedded platform. Based on this background...

Full description

Bibliographic Details
Main Authors:	Chih-Lun Fang, 方志倫
Other Authors:	Tsung-Han Tsai
Format:	Others
Language:	en_US
Published:	2011
Online Access:	http://ndltd.ncl.edu.tw/handle/53944089846155362666

id	ndltd-TW-099NCU05442049
record_format	oai_dc
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	博士 === 國立中央大學 === 電機工程研究所 === 99 === 　Computer vision has become an important research field in recent years. Many computer vision-based algorithms are proposed to design various novel systems. Theses systems could be further implemented and realized on an embedded platform. Based on this background, this thesis designs a novel computer vision-based system. Generally, feature extraction is an important technique in computer vision, which can realize various systems. Among the different features, text feature has high level of semantics, and global motion has more importance in video stabilization and video coding. Thus, this thesis addresses the feature extraction of text and global motion and its applications on text-video inpainting and video stabilization. 　For text feature extraction, today, more superimposed text is embedded within videos. Usually some text is unnecessary. Thus, one requires an approach to remove the text and inpaint the video. However, few conventional approaches inpaints the video well due to the large-sized text, structure regions, and various types of videos. In response, this study designed a text-video inpainting algorithm that poses text-video inpainting as structure repair and texture propagation. To repair the structure regions, the structure interpolation uses the new model’s rotated block matching to estimate the initial location of inpainted regions and later refine the coordinates of inpainted regions. The information in the neighboring frames then fills the structure regions. To inpaint the structure regions without tedious manual interaction, the structure extension utilizes the spline curve estimation. Afterwards, derivative propagation realizes the texture region inpainting. The experiment results are based on several real text-video, where all of the text regions were inpainted with spatio-temporal consistency. Additionally, comparisons present that the performance of the proposed algorithm is superior to those of conventional approaches. Its advantages include the reduction of design complexity by only integrating the structure information in multi-frame and the demonstration of structure consistency for realistic videos. 　Additionally, some text feature is important information. Thus, this research utilizes embedded text to achieve an intelligent multimedia display. We design a text in picture (TiP) display system which can extract the texts in the subchannel and then combine these texts with the main channel. This system was constructed on a dual-core platform to reach real-time text extraction and display. A schedulable design framework was proposed to partition the TiP display with text extraction in pipeline running. A data-aware transfer scheme was designed in which some data can be reused. Single instruction multiple data (SIMD) based mechanisms were created to enhance the computational efficiency on numerous convolutions and accumulations in text extraction. Quadruple buffering was manipulated to process the input/output in text extraction simultaneously. To optimize the labeling and filling tasks, the multi-banking and multi-tasking were developed. The evaluation results indicated that the proposed techniques can speed up the processing time of TiP display with text extraction. The equivalent comparison presented that the proposed techniques are more proficient at realizing text extraction. 　For global motion feature extraction, to remove the unwanted vibration in video, a robust video stabilization algorithm based on global motion feature extraction is proposed. To achieve real-time video stabilization, the proposed algorithm is realized on a dual-core embedded platform. In our approach, the global motion is calculated from the local motion. The local motion is derived from feature-centered block matching with lower computation. Based on the assumption that the motion of static background represents the global motion, a background motion model is proposed. The histogram-based computation operates the local motion for initial global motion estimation. Afterwards, global motion is refined by an updating procedure. The updating procedure updates the global motion based on the background motion model. Finally, the video is smoothed and stabilized based on the computed global motion. In addition, to enhance the performance of video stabilization on an embedded platform, several novel optimization approaches are proposed. The video stabilization tasks are partitioned and scheduled on the dual cores. A function simplification approach is designed to optimize the response function in feature-point selection task. Moreover, the speed of feature-centered block matching is enhanced by the region-based memory access and sum of absolute difference (SAD) optimization. As well as, the global motion estimation is optimized. The experimental results present that the proposed video stabilization approach can accurately estimate the global motion and produce well stabilized videos. The comparison also demonstrates our superior performance on video stabilization. Based on the evaluation results, the proposed optimization approaches can significantly increase the performance of video stabilization for real-time processing.
author2	Tsung-Han Tsai
author_facet	Tsung-Han Tsai Chih-Lun Fang 方志倫
author	Chih-Lun Fang 方志倫
spellingShingle	Chih-Lun Fang 方志倫 Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques
author_sort	Chih-Lun Fang
title	Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques
title_short	Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques
title_full	Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques
title_fullStr	Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques
title_full_unstemmed	Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques
title_sort	design of feature extraction for text-video processing and video stabilization system with computer vision-based techniques
publishDate	2011
url	http://ndltd.ncl.edu.tw/handle/53944089846155362666
work_keys_str_mv	AT chihlunfang designoffeatureextractionfortextvideoprocessingandvideostabilizationsystemwithcomputervisionbasedtechniques AT fāngzhìlún designoffeatureextractionfortextvideoprocessingandvideostabilizationsystemwithcomputervisionbasedtechniques AT chihlunfang diànnǎoshìjuétèzhēngzhícuìqǔyúzìmùshìxùnchùlǐjíshìxùnfángshǒuzhènxìtǒngshèjìzhīyánjiū AT fāngzhìlún diànnǎoshìjuétèzhēngzhícuìqǔyúzìmùshìxùnchùlǐjíshìxùnfángshǒuzhènxìtǒngshèjìzhīyánjiū
_version_	1718496016572874752
spelling	ndltd-TW-099NCU054420492017-07-14T04:27:43Z http://ndltd.ncl.edu.tw/handle/53944089846155362666 Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques 電腦視覺特徵值萃取於字幕視訊處理及視訊防手震系統設計之研究 Chih-Lun Fang 方志倫博士國立中央大學電機工程研究所 99 　Computer vision has become an important research field in recent years. Many computer vision-based algorithms are proposed to design various novel systems. Theses systems could be further implemented and realized on an embedded platform. Based on this background, this thesis designs a novel computer vision-based system. Generally, feature extraction is an important technique in computer vision, which can realize various systems. Among the different features, text feature has high level of semantics, and global motion has more importance in video stabilization and video coding. Thus, this thesis addresses the feature extraction of text and global motion and its applications on text-video inpainting and video stabilization. 　For text feature extraction, today, more superimposed text is embedded within videos. Usually some text is unnecessary. Thus, one requires an approach to remove the text and inpaint the video. However, few conventional approaches inpaints the video well due to the large-sized text, structure regions, and various types of videos. In response, this study designed a text-video inpainting algorithm that poses text-video inpainting as structure repair and texture propagation. To repair the structure regions, the structure interpolation uses the new model’s rotated block matching to estimate the initial location of inpainted regions and later refine the coordinates of inpainted regions. The information in the neighboring frames then fills the structure regions. To inpaint the structure regions without tedious manual interaction, the structure extension utilizes the spline curve estimation. Afterwards, derivative propagation realizes the texture region inpainting. The experiment results are based on several real text-video, where all of the text regions were inpainted with spatio-temporal consistency. Additionally, comparisons present that the performance of the proposed algorithm is superior to those of conventional approaches. Its advantages include the reduction of design complexity by only integrating the structure information in multi-frame and the demonstration of structure consistency for realistic videos. 　Additionally, some text feature is important information. Thus, this research utilizes embedded text to achieve an intelligent multimedia display. We design a text in picture (TiP) display system which can extract the texts in the subchannel and then combine these texts with the main channel. This system was constructed on a dual-core platform to reach real-time text extraction and display. A schedulable design framework was proposed to partition the TiP display with text extraction in pipeline running. A data-aware transfer scheme was designed in which some data can be reused. Single instruction multiple data (SIMD) based mechanisms were created to enhance the computational efficiency on numerous convolutions and accumulations in text extraction. Quadruple buffering was manipulated to process the input/output in text extraction simultaneously. To optimize the labeling and filling tasks, the multi-banking and multi-tasking were developed. The evaluation results indicated that the proposed techniques can speed up the processing time of TiP display with text extraction. The equivalent comparison presented that the proposed techniques are more proficient at realizing text extraction. 　For global motion feature extraction, to remove the unwanted vibration in video, a robust video stabilization algorithm based on global motion feature extraction is proposed. To achieve real-time video stabilization, the proposed algorithm is realized on a dual-core embedded platform. In our approach, the global motion is calculated from the local motion. The local motion is derived from feature-centered block matching with lower computation. Based on the assumption that the motion of static background represents the global motion, a background motion model is proposed. The histogram-based computation operates the local motion for initial global motion estimation. Afterwards, global motion is refined by an updating procedure. The updating procedure updates the global motion based on the background motion model. Finally, the video is smoothed and stabilized based on the computed global motion. In addition, to enhance the performance of video stabilization on an embedded platform, several novel optimization approaches are proposed. The video stabilization tasks are partitioned and scheduled on the dual cores. A function simplification approach is designed to optimize the response function in feature-point selection task. Moreover, the speed of feature-centered block matching is enhanced by the region-based memory access and sum of absolute difference (SAD) optimization. As well as, the global motion estimation is optimized. The experimental results present that the proposed video stabilization approach can accurately estimate the global motion and produce well stabilized videos. The comparison also demonstrates our superior performance on video stabilization. Based on the evaluation results, the proposed optimization approaches can significantly increase the performance of video stabilization for real-time processing. Tsung-Han Tsai 蔡宗漢 2011 學位論文 ; thesis 145 en_US

Design of Feature Extraction for Text-Video Processing and Video Stabilization System with Computer Vision-Based Techniques

Similar Items