Unsupervised statistical models for general object recognition
We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previou...
Main Author: | |
---|---|
Language: | English |
Published: |
2009
|
Online Access: | http://hdl.handle.net/2429/14543 |
id |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-14543 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-145432014-03-14T15:47:38Z Unsupervised statistical models for general object recognition Carbonetto, Peter We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues. First, we describe a model that incorporates clustering into the learning step using a basic mixture model. Second, we propose Bayesian priors on the mixture model to stabilise learning and automatically weight features. Third, we develop a more expressive model that learns spatial relations between regions of a scene. Using the analogy of building a lexicon via an aligned bitext, we formulate a probabilistic mapping between the image feature vectors and the supplied word tokens. To find the best hypothesis, we hill-climb the log-posterior using the EM algorithm. Spatial context introduces cycles to our probabilistic graphical model, so we use loopy belief propagation to compute the expectation of the complete log-posterior, and iterative scaling and iterative proportional fitting on the pseudo-likelihood approximation to render parameter estimation tractable. The EM algorithm is no longer guaranteed to converge with an intractable posterior, but experiments show the approximate E and M Steps consistently converge to a local solution. Empirical results on a diverse array of images show that learning image feature clusters using a standard mixture model, feature weighting using Bayesian shrinkage priors and spatial context potentials considerably improve the accuracy of general object recognition. Moreover, results suggest that good performance can be obtained without expensive image segmentations. 2009-11-02T20:18:00Z 2009-11-02T20:18:00Z 2003 2009-11-02T20:18:00Z 2003-11 Electronic Thesis or Dissertation http://hdl.handle.net/2429/14543 eng UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/] |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
description |
We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues. First, we describe a model that incorporates clustering into the learning step using a basic mixture model. Second, we propose Bayesian priors on the mixture model to stabilise learning and automatically weight features. Third, we develop a more expressive model that learns spatial relations between regions of a scene. Using the analogy of building a lexicon via an aligned bitext, we formulate a probabilistic mapping between the image feature vectors and the supplied word tokens. To find the best hypothesis, we hill-climb the log-posterior using the EM algorithm. Spatial context introduces cycles to our probabilistic graphical model, so we use loopy belief propagation to compute the expectation of the complete log-posterior, and iterative scaling and iterative proportional fitting on the pseudo-likelihood approximation to render parameter estimation tractable. The EM algorithm is no longer guaranteed to converge with an intractable posterior, but experiments show the approximate E and M Steps consistently converge to a local solution. Empirical results on a diverse array of images show that learning image feature clusters using a standard mixture model, feature weighting using Bayesian shrinkage priors and spatial context potentials considerably improve the accuracy of general object recognition. Moreover, results suggest that good performance can be obtained without expensive image segmentations. |
author |
Carbonetto, Peter |
spellingShingle |
Carbonetto, Peter Unsupervised statistical models for general object recognition |
author_facet |
Carbonetto, Peter |
author_sort |
Carbonetto, Peter |
title |
Unsupervised statistical models for general object recognition |
title_short |
Unsupervised statistical models for general object recognition |
title_full |
Unsupervised statistical models for general object recognition |
title_fullStr |
Unsupervised statistical models for general object recognition |
title_full_unstemmed |
Unsupervised statistical models for general object recognition |
title_sort |
unsupervised statistical models for general object recognition |
publishDate |
2009 |
url |
http://hdl.handle.net/2429/14543 |
work_keys_str_mv |
AT carbonettopeter unsupervisedstatisticalmodelsforgeneralobjectrecognition |
_version_ |
1716653045989769216 |