Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors
With the recent development of low-cost depth sensors, an entirely new type of 3D data is being generated rapidly by regular consumers. Traditionally, 3D data is produced by a small number of professional designers (i.e., the Computer Aided Design (CAD) model); however, 3D data from massive consumer...
Main Author: | |
---|---|
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://doi.org/10.7916/D8RX9B7V |
id |
ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8RX9B7V |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer science Robotics |
spellingShingle |
Computer science Robotics Wang, Yan Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors |
description |
With the recent development of low-cost depth sensors, an entirely new type of 3D data is being generated rapidly by regular consumers. Traditionally, 3D data is produced by a small number of professional designers (i.e., the Computer Aided Design (CAD) model); however, 3D data from massive consumer-level sensors has the potential of introducing many new applications, such as user-captured 3D warehouse and search engines, robots with 3D sensing capability, and customized 3D printing. Nevertheless, the low-cost sensors used by general consumers also pose new technological challenges. First, they have relatively high levels of sensor noise. Second, the use of such consumer devices is often in uncontrolled settings, resulting in challenging conditions, such as poor lighting, cluttered scenes, and object occlusion. To address such emerging opportunities and associated challenges, this dissertation is dedicated to the development of novel algorithms and systems for 3D data understanding and processing, using input from a consumer-level 3D sensor.
In particular, the key problems of 3D shape retrieval, scene understanding, and pose recognition are explored in order to present a comprehensive coverage of the key aspects of content-based 3D shape analysis. To resolve the aforementioned challenges, we propose a flexible Markov Random Field (MRF) framework that uses local information to allow partial matching, and thus address the model incompleteness problem; the framework also uses higher-order correlation to provide additional robustness against sensor noise. With the MRF framework, these 3D analysis problems can be transformed into a unified potential energy minimization problem, while preserving the flexibility to adapt to different settings and resolve the unique challenges of each problem. The contributions of the dissertation include:
a. Cross-Domain 3D Retrieval: First we tackle the problem of searching 3D noise- free models using noisy data captured by low-cost 3D sensors – a unique cross-domain setting. To manage the challenges of sensor noise and model incompleteness from consumer-level sensors, we propose a novel MRF formulation for the retrieval problem. The potential function of the random field is designed to capture both the local shape and global spatial consistency in order to preserve the local matching capability, while offering robustness against the sensor noise. The specific form of the potential functions is determined efficiently by a series of weak classifiers, thus forming a variant of the Regression Tree Field (RTF). We achieve better retrieval precision and recall in the cross-domain settings with a consumer-level depth sensor compared with state-of-the-art approaches.
b. 3D Scene Understanding: We develop a scene understanding system based on input from consumer-level depth sensors. To resolve the key challenge of the lack of annotated 3D training data, we construct an MRF that connects the input 3D point cloud and the associated 2D reference images, based on which the 3D point cloud is stitched. A series of weak classifiers are trained to obtain an approximate semantic segmentation result from the reference images. The potential function of the field is designed to integrate the results from the classifiers, while taking advantage of the 3D spatial consistency in order to output a comprehensive scene understanding result. We achieve comparable accuracy and much faster speed compared with state-of-the-art 3D scene understanding systems, with the difference that we do not require annotated 3D training data.
c. Pose Recognition of Deformable Objects: We develop a method for supporting a robotics system to recognize pose and manipulate deformable objects. More specifically, garment pose is recognized with the help of an offline simulated database and the proposed retrieval approach. We use a novel binary feature representation extracted from the reconstructed 3D surfaces in order to allow efficient matching, thus achieving real-time performance. A spatial weight is further learned in order to integrate the local matching result. The system shows superior recognition accuracy and faster speed than the state-of-the-art approaches.
d. Application with 2D Data: In addition to the traditional 3D applications, we explore the possibility of extending MRF formulation to 2D data, especially those used in classical low-level 2D vision problems, such as image deblurring and denoising. One well-known technique that uses image prior, the probabilistic patched-based prior, is known to have bottlenecks in finding the most similar model from a model set, which can be posed as a retrieval problem. Therefore, we apply the MRF formulation originally developed for 3D shape retrieval, and extend it to this 2D problem by introducing a grid-like random field structure. We can achieve 40x acceleration compared with the state-of-the-art algorithm, while preserving quality.
We organize the dissertation as follows. First, the core problems of 3D shape retrieval, scene understanding, and pose recognition, and with the proposed solutions that use MRF and RTF are explored in Part I. In Part II, the extension to 2D data is discussed. Extensive evaluation is performed in each specific task in order to compare the proposed approaches with state-of-the-art algorithms and systems, and also to justify the components of the proposed methods. Finally, in Part III, we include the conclusion remarks and discussion of open issues and future work. |
author |
Wang, Yan |
author_facet |
Wang, Yan |
author_sort |
Wang, Yan |
title |
Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors |
title_short |
Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors |
title_full |
Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors |
title_fullStr |
Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors |
title_full_unstemmed |
Three-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost Sensors |
title_sort |
three-dimensional object search, understanding, and pose estimation with low-cost sensors |
publishDate |
2015 |
url |
https://doi.org/10.7916/D8RX9B7V |
work_keys_str_mv |
AT wangyan threedimensionalobjectsearchunderstandingandposeestimationwithlowcostsensors |
_version_ |
1719046529484849152 |
spelling |
ndltd-columbia.edu-oai-academiccommons.columbia.edu-10.7916-D8RX9B7V2019-05-09T15:14:52ZThree-Dimensional Object Search, Understanding, and Pose Estimation with Low-Cost SensorsWang, Yan2015ThesesComputer scienceRoboticsWith the recent development of low-cost depth sensors, an entirely new type of 3D data is being generated rapidly by regular consumers. Traditionally, 3D data is produced by a small number of professional designers (i.e., the Computer Aided Design (CAD) model); however, 3D data from massive consumer-level sensors has the potential of introducing many new applications, such as user-captured 3D warehouse and search engines, robots with 3D sensing capability, and customized 3D printing. Nevertheless, the low-cost sensors used by general consumers also pose new technological challenges. First, they have relatively high levels of sensor noise. Second, the use of such consumer devices is often in uncontrolled settings, resulting in challenging conditions, such as poor lighting, cluttered scenes, and object occlusion. To address such emerging opportunities and associated challenges, this dissertation is dedicated to the development of novel algorithms and systems for 3D data understanding and processing, using input from a consumer-level 3D sensor. In particular, the key problems of 3D shape retrieval, scene understanding, and pose recognition are explored in order to present a comprehensive coverage of the key aspects of content-based 3D shape analysis. To resolve the aforementioned challenges, we propose a flexible Markov Random Field (MRF) framework that uses local information to allow partial matching, and thus address the model incompleteness problem; the framework also uses higher-order correlation to provide additional robustness against sensor noise. With the MRF framework, these 3D analysis problems can be transformed into a unified potential energy minimization problem, while preserving the flexibility to adapt to different settings and resolve the unique challenges of each problem. The contributions of the dissertation include: a. Cross-Domain 3D Retrieval: First we tackle the problem of searching 3D noise- free models using noisy data captured by low-cost 3D sensors – a unique cross-domain setting. To manage the challenges of sensor noise and model incompleteness from consumer-level sensors, we propose a novel MRF formulation for the retrieval problem. The potential function of the random field is designed to capture both the local shape and global spatial consistency in order to preserve the local matching capability, while offering robustness against the sensor noise. The specific form of the potential functions is determined efficiently by a series of weak classifiers, thus forming a variant of the Regression Tree Field (RTF). We achieve better retrieval precision and recall in the cross-domain settings with a consumer-level depth sensor compared with state-of-the-art approaches. b. 3D Scene Understanding: We develop a scene understanding system based on input from consumer-level depth sensors. To resolve the key challenge of the lack of annotated 3D training data, we construct an MRF that connects the input 3D point cloud and the associated 2D reference images, based on which the 3D point cloud is stitched. A series of weak classifiers are trained to obtain an approximate semantic segmentation result from the reference images. The potential function of the field is designed to integrate the results from the classifiers, while taking advantage of the 3D spatial consistency in order to output a comprehensive scene understanding result. We achieve comparable accuracy and much faster speed compared with state-of-the-art 3D scene understanding systems, with the difference that we do not require annotated 3D training data. c. Pose Recognition of Deformable Objects: We develop a method for supporting a robotics system to recognize pose and manipulate deformable objects. More specifically, garment pose is recognized with the help of an offline simulated database and the proposed retrieval approach. We use a novel binary feature representation extracted from the reconstructed 3D surfaces in order to allow efficient matching, thus achieving real-time performance. A spatial weight is further learned in order to integrate the local matching result. The system shows superior recognition accuracy and faster speed than the state-of-the-art approaches. d. Application with 2D Data: In addition to the traditional 3D applications, we explore the possibility of extending MRF formulation to 2D data, especially those used in classical low-level 2D vision problems, such as image deblurring and denoising. One well-known technique that uses image prior, the probabilistic patched-based prior, is known to have bottlenecks in finding the most similar model from a model set, which can be posed as a retrieval problem. Therefore, we apply the MRF formulation originally developed for 3D shape retrieval, and extend it to this 2D problem by introducing a grid-like random field structure. We can achieve 40x acceleration compared with the state-of-the-art algorithm, while preserving quality. We organize the dissertation as follows. First, the core problems of 3D shape retrieval, scene understanding, and pose recognition, and with the proposed solutions that use MRF and RTF are explored in Part I. In Part II, the extension to 2D data is discussed. Extensive evaluation is performed in each specific task in order to compare the proposed approaches with state-of-the-art algorithms and systems, and also to justify the components of the proposed methods. Finally, in Part III, we include the conclusion remarks and discussion of open issues and future work.Englishhttps://doi.org/10.7916/D8RX9B7V |