Summary: | Recognizing objects in images is an effortless task for most people.Automating this task with computers, however, presents a difficult challengeattributable to large variations in object appearance, shape, and pose. The problemis further compounded by ambiguity from projecting 3-D objects into a 2-D image.In this thesis we present an approach to resolve these issues by modeling objectstructure with a collection of connected 3-D geometric primitives and a separatemodel for the camera. From sets of images we simultaneously learn a generative,statistical model for the object representation and parameters of the imagingsystem. By learning 3-D structure models we are going beyond recognitiontowards quantifying object shape and understanding its variation.We explore our approach in the context of microscopic images of biologicalstructure and single view images of man-made objects composed of block-likeparts, such as furniture. We express detected features from both domains asstatistically generated by an image likelihood conditioned on models for theobject structure and imaging system. Our representation of biological structurefocuses on Alternaria, a genus of fungus comprising ellipsoid and cylindershaped substructures. In the case of man-made furniture objects, we representstructure with spatially contiguous assemblages of blocks arbitrarilyconstructed according to a small set of design constraints.We learn the models with Bayesian statistical inference over structure andcamera parameters per image, and for man-made objects, across categories, suchas chairs. We develop a reversible-jump MCMC sampling algorithm to exploretopology hypotheses, and a hybrid of Metropolis-Hastings and stochastic dynamicsto search within topologies. Our results demonstrate that we can infer both 3-Dobject and camera parameters simultaneously from images, and that doing soimproves understanding of structure in images. We further show how 3-D structuremodels can be inferred from single view images, and that learned categoryparameters capture structure variation that is useful for recognition.
|