Summary: | 碩士 === 國立臺灣科技大學 === 資訊工程系 === 103 === Scene recognition is an important problem in many application areas of image and video processing. Scene recognition has a wide range of applications, such as object recognition and detection, content-based image indexing and retrieval and intelligent vehicle and robot navigation. However, the natural scene images tend to be very complex and difficult to analyze due to changes of illumination and transformation. In this thesis, we will investigate into building a novel model to learn and recognize scenes in nature.
This study proposed a new approach that combines locality-constrained sparse coding (LCSP), Spatial Pyramid Pooling and linear SVM in end-to-end model. Firstly, interesting points each image in the training set are extracted by a local descriptor as dense SIFT which represents local spatial information. These features known as codewords and each codeword is represented as part of a topic. Then we employs LCSP algorithm to learn the codeword distribution of those local features from the training dataset. Next, a modified Spatial Pyramid Pooling model is employed for encoding the spatial distribution of local features. Spatial Pyramid Pooling model has been remarkably successful in terms of both scene and object recognition. In the testing stage, a linear SVM will be used to classify local features which are encoded by Spatial Pyramid Pooling. The new system achieved very competitive results and leading to state-of-the-art performance on several benchmarks.
|