Unsupervised statistical models for general object recognition

We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previou...

Full description

Bibliographic Details
Main Author:	Carbonetto, Peter
Language:	English
Published:	2009
Online Access:	http://hdl.handle.net/2429/14543

id	ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-14543
record_format	oai_dc
spelling	ndltd-LACETR-oai-collectionscanada.gc.ca-BVAU.2429-145432014-03-14T15:47:38Z Unsupervised statistical models for general object recognition Carbonetto, Peter We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues. First, we describe a model that incorporates clustering into the learning step using a basic mixture model. Second, we propose Bayesian priors on the mixture model to stabilise learning and automatically weight features. Third, we develop a more expressive model that learns spatial relations between regions of a scene. Using the analogy of building a lexicon via an aligned bitext, we formulate a probabilistic mapping between the image feature vectors and the supplied word tokens. To find the best hypothesis, we hill-climb the log-posterior using the EM algorithm. Spatial context introduces cycles to our probabilistic graphical model, so we use loopy belief propagation to compute the expectation of the complete log-posterior, and iterative scaling and iterative proportional fitting on the pseudo-likelihood approximation to render parameter estimation tractable. The EM algorithm is no longer guaranteed to converge with an intractable posterior, but experiments show the approximate E and M Steps consistently converge to a local solution. Empirical results on a diverse array of images show that learning image feature clusters using a standard mixture model, feature weighting using Bayesian shrinkage priors and spatial context potentials considerably improve the accuracy of general object recognition. Moreover, results suggest that good performance can be obtained without expensive image segmentations. 2009-11-02T20:18:00Z 2009-11-02T20:18:00Z 2003 2009-11-02T20:18:00Z 2003-11 Electronic Thesis or Dissertation http://hdl.handle.net/2429/14543 eng UBC Retrospective Theses Digitization Project [http://www.library.ubc.ca/archives/retro_theses/]
collection	NDLTD
language	English
sources	NDLTD
description	We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues. First, we describe a model that incorporates clustering into the learning step using a basic mixture model. Second, we propose Bayesian priors on the mixture model to stabilise learning and automatically weight features. Third, we develop a more expressive model that learns spatial relations between regions of a scene. Using the analogy of building a lexicon via an aligned bitext, we formulate a probabilistic mapping between the image feature vectors and the supplied word tokens. To find the best hypothesis, we hill-climb the log-posterior using the EM algorithm. Spatial context introduces cycles to our probabilistic graphical model, so we use loopy belief propagation to compute the expectation of the complete log-posterior, and iterative scaling and iterative proportional fitting on the pseudo-likelihood approximation to render parameter estimation tractable. The EM algorithm is no longer guaranteed to converge with an intractable posterior, but experiments show the approximate E and M Steps consistently converge to a local solution. Empirical results on a diverse array of images show that learning image feature clusters using a standard mixture model, feature weighting using Bayesian shrinkage priors and spatial context potentials considerably improve the accuracy of general object recognition. Moreover, results suggest that good performance can be obtained without expensive image segmentations.
author	Carbonetto, Peter
spellingShingle	Carbonetto, Peter Unsupervised statistical models for general object recognition
author_facet	Carbonetto, Peter
author_sort	Carbonetto, Peter
title	Unsupervised statistical models for general object recognition
title_short	Unsupervised statistical models for general object recognition
title_full	Unsupervised statistical models for general object recognition
title_fullStr	Unsupervised statistical models for general object recognition
title_full_unstemmed	Unsupervised statistical models for general object recognition
title_sort	unsupervised statistical models for general object recognition
publishDate	2009
url	http://hdl.handle.net/2429/14543
work_keys_str_mv	AT carbonettopeter unsupervisedstatisticalmodelsforgeneralobjectrecognition
_version_	1716653045989769216

Unsupervised statistical models for general object recognition

Similar Items