Unsupervised statistical models for general object recognition

We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previou...

Full description

Bibliographic Details
Main Author: Carbonetto, Peter
Format: Others
Language:English
Published: 2009
Online Access:http://hdl.handle.net/2429/14543
id ndltd-UBC-oai-circle.library.ubc.ca-2429-14543
record_format oai_dc
spelling ndltd-UBC-oai-circle.library.ubc.ca-2429-145432018-01-05T17:37:20Z Unsupervised statistical models for general object recognition Carbonetto, Peter We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues. First, we describe a model that incorporates clustering into the learning step using a basic mixture model. Second, we propose Bayesian priors on the mixture model to stabilise learning and automatically weight features. Third, we develop a more expressive model that learns spatial relations between regions of a scene. Using the analogy of building a lexicon via an aligned bitext, we formulate a probabilistic mapping between the image feature vectors and the supplied word tokens. To find the best hypothesis, we hill-climb the log-posterior using the EM algorithm. Spatial context introduces cycles to our probabilistic graphical model, so we use loopy belief propagation to compute the expectation of the complete log-posterior, and iterative scaling and iterative proportional fitting on the pseudo-likelihood approximation to render parameter estimation tractable. The EM algorithm is no longer guaranteed to converge with an intractable posterior, but experiments show the approximate E and M Steps consistently converge to a local solution. Empirical results on a diverse array of images show that learning image feature clusters using a standard mixture model, feature weighting using Bayesian shrinkage priors and spatial context potentials considerably improve the accuracy of general object recognition. Moreover, results suggest that good performance can be obtained without expensive image segmentations. Science, Faculty of Computer Science, Department of Graduate 2009-11-02T20:18:00Z 2009-11-02T20:18:00Z 2003 2003-11 Text Thesis/Dissertation http://hdl.handle.net/2429/14543 eng For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. 4034738 bytes application/pdf
collection NDLTD
language English
format Others
sources NDLTD
description We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues. First, we describe a model that incorporates clustering into the learning step using a basic mixture model. Second, we propose Bayesian priors on the mixture model to stabilise learning and automatically weight features. Third, we develop a more expressive model that learns spatial relations between regions of a scene. Using the analogy of building a lexicon via an aligned bitext, we formulate a probabilistic mapping between the image feature vectors and the supplied word tokens. To find the best hypothesis, we hill-climb the log-posterior using the EM algorithm. Spatial context introduces cycles to our probabilistic graphical model, so we use loopy belief propagation to compute the expectation of the complete log-posterior, and iterative scaling and iterative proportional fitting on the pseudo-likelihood approximation to render parameter estimation tractable. The EM algorithm is no longer guaranteed to converge with an intractable posterior, but experiments show the approximate E and M Steps consistently converge to a local solution. Empirical results on a diverse array of images show that learning image feature clusters using a standard mixture model, feature weighting using Bayesian shrinkage priors and spatial context potentials considerably improve the accuracy of general object recognition. Moreover, results suggest that good performance can be obtained without expensive image segmentations. === Science, Faculty of === Computer Science, Department of === Graduate
author Carbonetto, Peter
spellingShingle Carbonetto, Peter
Unsupervised statistical models for general object recognition
author_facet Carbonetto, Peter
author_sort Carbonetto, Peter
title Unsupervised statistical models for general object recognition
title_short Unsupervised statistical models for general object recognition
title_full Unsupervised statistical models for general object recognition
title_fullStr Unsupervised statistical models for general object recognition
title_full_unstemmed Unsupervised statistical models for general object recognition
title_sort unsupervised statistical models for general object recognition
publishDate 2009
url http://hdl.handle.net/2429/14543
work_keys_str_mv AT carbonettopeter unsupervisedstatisticalmodelsforgeneralobjectrecognition
_version_ 1718589654130753536