Summary: | 碩士 === 國立清華大學 === 電機工程學系 === 97 === Most learning-based video semantic analysis methods hope to obtain the good semantic model that require a large training set to achieve good performances. However, annotating a large video is labor-intensive and the training data set collection is not easy either. Generally, most of training set selection schemes adopted a random selection or select parts of the video data, they neglect similar and extensive characteristics of the training set.
In this thesis, we propose several different methods to construct the training set and reduce user involvement. They are clustering-based, spatial dispersiveness,
temporal dispersiveness, and sample-based. Using those selection schemes, we hope to construct a small size and effective training set by using spatial and temporal
distribution, clustering information of the whole video data. Therefore if the selected training data can represent the characteristic of the whole video data, the classification performance will be better even when the size of the training set is smaller than that of
the whole video data.
We can choose the best samples for training a semantic model and use SVM to classify the category of each sample. This thesis intends to classify the shots of the semantic into the five categories: person, landscape, cityscape, map and others. Experimental results show that these methods are effective for training set selection in video annotation, and outperform random selection.
|