RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey

Object recognition in real-world environments is one of the fundamental and key tasks in computer vision and robotics communities. With the advanced sensing technologies and low-cost depth sensors, the high-quality RGB and depth images can be recorded synchronously, and the object recognition perfor...

Full description

Bibliographic Details
Main Authors: Mingliang Gao, Jun Jiang, Guofeng Zou, Vijay John, Zheng Liu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8683987/
id doaj-e136822c24114918ab8e11e350027253
record_format Article
spelling doaj-e136822c24114918ab8e11e3500272532021-03-29T22:48:17ZengIEEEIEEE Access2169-35362019-01-017431104313610.1109/ACCESS.2019.29070718683987RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A SurveyMingliang Gao0https://orcid.org/0000-0001-7273-7499Jun Jiang1Guofeng Zou2https://orcid.org/0000-0002-8023-0142Vijay John3Zheng Liu4School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, ChinaSchool of Computer Science and Technology, Southwest Petroleum University, Chengdu, ChinaSchool of Electrical and Electronic Engineering, Shandong University of Technology, Zibo, ChinaIntelligent Information Processing Laboratory, Toyota Technological Institute, Nagoya, JapanFaculty of Applied Science, The University of British Columbia, Vancouver, CanadaObject recognition in real-world environments is one of the fundamental and key tasks in computer vision and robotics communities. With the advanced sensing technologies and low-cost depth sensors, the high-quality RGB and depth images can be recorded synchronously, and the object recognition performance can be improved by jointly exploiting them. RGB-D-based object recognition has evolved from early methods that using hand-crafted representations to the current state-of-the-art deep learning-based methods. With the undeniable success of deep learning, especially convolutional neural networks (CNNs) in the visual domain, the natural progression of deep learning research points to problems involving larger and more complex multimodal data. In this paper, we provide a comprehensive survey of recent multimodal CNNs (MMCNNs)-based approaches that have demonstrated significant improvements over previous methods. We highlight two key issues, namely, training data deficiency and multimodal fusion. In addition, we summarize and discuss the publicly available RGB-D object recognition datasets and present a comparative performance evaluation of the proposed methods on these benchmark datasets. Finally, we identify promising avenues of research in this rapidly evolving field. This survey will not only enable researchers to get a good overview of the state-of-the-art methods for RGB-D-based object recognition but also provide a reference for other multimodal machine learning applications, e.g., multimodal medical image fusion, audio-visual speech recognition, and multimedia retrieval and generation.https://ieeexplore.ieee.org/document/8683987/Convolutional neural networkmultimodal fusionobject recognitionRGB-Dsurvey
collection DOAJ
language English
format Article
sources DOAJ
author Mingliang Gao
Jun Jiang
Guofeng Zou
Vijay John
Zheng Liu
spellingShingle Mingliang Gao
Jun Jiang
Guofeng Zou
Vijay John
Zheng Liu
RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey
IEEE Access
Convolutional neural network
multimodal fusion
object recognition
RGB-D
survey
author_facet Mingliang Gao
Jun Jiang
Guofeng Zou
Vijay John
Zheng Liu
author_sort Mingliang Gao
title RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey
title_short RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey
title_full RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey
title_fullStr RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey
title_full_unstemmed RGB-D-Based Object Recognition Using Multimodal Convolutional Neural Networks: A Survey
title_sort rgb-d-based object recognition using multimodal convolutional neural networks: a survey
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Object recognition in real-world environments is one of the fundamental and key tasks in computer vision and robotics communities. With the advanced sensing technologies and low-cost depth sensors, the high-quality RGB and depth images can be recorded synchronously, and the object recognition performance can be improved by jointly exploiting them. RGB-D-based object recognition has evolved from early methods that using hand-crafted representations to the current state-of-the-art deep learning-based methods. With the undeniable success of deep learning, especially convolutional neural networks (CNNs) in the visual domain, the natural progression of deep learning research points to problems involving larger and more complex multimodal data. In this paper, we provide a comprehensive survey of recent multimodal CNNs (MMCNNs)-based approaches that have demonstrated significant improvements over previous methods. We highlight two key issues, namely, training data deficiency and multimodal fusion. In addition, we summarize and discuss the publicly available RGB-D object recognition datasets and present a comparative performance evaluation of the proposed methods on these benchmark datasets. Finally, we identify promising avenues of research in this rapidly evolving field. This survey will not only enable researchers to get a good overview of the state-of-the-art methods for RGB-D-based object recognition but also provide a reference for other multimodal machine learning applications, e.g., multimodal medical image fusion, audio-visual speech recognition, and multimedia retrieval and generation.
topic Convolutional neural network
multimodal fusion
object recognition
RGB-D
survey
url https://ieeexplore.ieee.org/document/8683987/
work_keys_str_mv AT minglianggao rgbdbasedobjectrecognitionusingmultimodalconvolutionalneuralnetworksasurvey
AT junjiang rgbdbasedobjectrecognitionusingmultimodalconvolutionalneuralnetworksasurvey
AT guofengzou rgbdbasedobjectrecognitionusingmultimodalconvolutionalneuralnetworksasurvey
AT vijayjohn rgbdbasedobjectrecognitionusingmultimodalconvolutionalneuralnetworksasurvey
AT zhengliu rgbdbasedobjectrecognitionusingmultimodalconvolutionalneuralnetworksasurvey
_version_ 1724190854719995904