Summary: | 碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === The touch-based displays (devices) have entailed rich interactions between videos and users. The objects appearing in videos usually interest users in wanting to know relative knowledge about them. We propose a video playback system for users to interactively query objects of interest in videos. Since the text information accompanied with videos might not be strongly related to the object of interest, we adopt visual appearances as features to retrieve similar objects from large image collections. The tags associated with the retrieved images are used to reveal related information of the object of interest for further exploiting related knowledge. Solely relying on single viewpoint of the object to query may suffer from different poses, occlusions and is not robust. So we present a novel video object segmentation framework to improve retrieval precision, which is based on a graph cut segmentation. To ensure prompt response and effectiveness, we augment the graph cut algorithm with compressed-domain motion vectors; compared with the prior method, the processing speed of our segmentation framework is significantly improved. The experiments on community-contributed videos demonstrate the effectiveness of our segmentation framework based on multi-frame object region query and the improvement of retrieval precision.
|