Inference Machines: Parsing Scenes via Iterated Predictions

Extracting a rich representation of an environment from visual sensor readings canbenefit many tasks in robotics, e.g., path planning, mapping, and object manipulation.While important progress has been made, it remains a difficult problem to effectivelyparse entire scenes, i.e., to recognize semanti...

Full description

Bibliographic Details
Main Author: Munoz, Daniel
Format: Others
Published: Research Showcase @ CMU 2013
Subjects:
Online Access:http://repository.cmu.edu/dissertations/305
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1309&context=dissertations
id ndltd-cmu.edu-oai-repository.cmu.edu-dissertations-1309
record_format oai_dc
spelling ndltd-cmu.edu-oai-repository.cmu.edu-dissertations-13092014-07-24T15:36:16Z Inference Machines: Parsing Scenes via Iterated Predictions Munoz, Daniel Extracting a rich representation of an environment from visual sensor readings canbenefit many tasks in robotics, e.g., path planning, mapping, and object manipulation.While important progress has been made, it remains a difficult problem to effectivelyparse entire scenes, i.e., to recognize semantic objects, man-made structures, and landforms.This process requires not only recognizing individual entities but also understandingthe contextual relations among them. The prevalent approach to encode such relationships is to use a joint probabilistic orenergy-based model which enables one to naturally write down these interactions. Unfortunately,performing exact inference over these expressive models is often intractableand instead we can only approximate the solutions. While there exists a set of sophisticatedapproximate inference techniques to choose from, the combination of learning andapproximate inference for these expressive models is still poorly understood in theoryand limited in practice. Furthermore, using approximate inference on any learned modeloften leads to suboptimal predictions due to the inherent approximations. As we ultimately care about predicting the correct labeling of a scene, and notnecessarily learning a joint model of the data, this work proposes to instead view theapproximate inference process as a modular procedure that is directly trained in orderto produce a correct labeling of the scene. Inspired by early hierarchical models in thecomputer vision literature for scene parsing, the proposed inference procedure is structuredto incorporate both feature descriptors and contextual cues computed at multipleresolutions within the scene. We demonstrate that this inference machine frameworkfor parsing scenes via iterated predictions offers the best of both worlds: state-of-the-artclassification accuracy and computational efficiency when processing images and/orunorganized 3-D point clouds. Additionally, we address critical problems that arise inpractice when parsing scenes on board real-world systems: integrating data from multiplesensor modalities and efficiently processing data that is continuously streaming fromthe sensors. 2013-06-06T07:00:00Z text application/pdf http://repository.cmu.edu/dissertations/305 http://repository.cmu.edu/cgi/viewcontent.cgi?article=1309&context=dissertations Dissertations Research Showcase @ CMU Robotics
collection NDLTD
format Others
sources NDLTD
topic Robotics
spellingShingle Robotics
Munoz, Daniel
Inference Machines: Parsing Scenes via Iterated Predictions
description Extracting a rich representation of an environment from visual sensor readings canbenefit many tasks in robotics, e.g., path planning, mapping, and object manipulation.While important progress has been made, it remains a difficult problem to effectivelyparse entire scenes, i.e., to recognize semantic objects, man-made structures, and landforms.This process requires not only recognizing individual entities but also understandingthe contextual relations among them. The prevalent approach to encode such relationships is to use a joint probabilistic orenergy-based model which enables one to naturally write down these interactions. Unfortunately,performing exact inference over these expressive models is often intractableand instead we can only approximate the solutions. While there exists a set of sophisticatedapproximate inference techniques to choose from, the combination of learning andapproximate inference for these expressive models is still poorly understood in theoryand limited in practice. Furthermore, using approximate inference on any learned modeloften leads to suboptimal predictions due to the inherent approximations. As we ultimately care about predicting the correct labeling of a scene, and notnecessarily learning a joint model of the data, this work proposes to instead view theapproximate inference process as a modular procedure that is directly trained in orderto produce a correct labeling of the scene. Inspired by early hierarchical models in thecomputer vision literature for scene parsing, the proposed inference procedure is structuredto incorporate both feature descriptors and contextual cues computed at multipleresolutions within the scene. We demonstrate that this inference machine frameworkfor parsing scenes via iterated predictions offers the best of both worlds: state-of-the-artclassification accuracy and computational efficiency when processing images and/orunorganized 3-D point clouds. Additionally, we address critical problems that arise inpractice when parsing scenes on board real-world systems: integrating data from multiplesensor modalities and efficiently processing data that is continuously streaming fromthe sensors.
author Munoz, Daniel
author_facet Munoz, Daniel
author_sort Munoz, Daniel
title Inference Machines: Parsing Scenes via Iterated Predictions
title_short Inference Machines: Parsing Scenes via Iterated Predictions
title_full Inference Machines: Parsing Scenes via Iterated Predictions
title_fullStr Inference Machines: Parsing Scenes via Iterated Predictions
title_full_unstemmed Inference Machines: Parsing Scenes via Iterated Predictions
title_sort inference machines: parsing scenes via iterated predictions
publisher Research Showcase @ CMU
publishDate 2013
url http://repository.cmu.edu/dissertations/305
http://repository.cmu.edu/cgi/viewcontent.cgi?article=1309&context=dissertations
work_keys_str_mv AT munozdaniel inferencemachinesparsingscenesviaiteratedpredictions
_version_ 1716709425492787200