Deep Multimodal Representation Learning: A Survey

Multimodal representation learning, which aims to narrow the heterogeneity gap among different modalities, plays an indispensable role in the utilization of ubiquitous multimodal data. Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal repr...

Full description

Bibliographic Details
Main Authors:	Wenzhong Guo, Jianwen Wang, Shiping Wang
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Multimodal representation learning multimodal deep learning deep multimodal fusion multimodal translation multimodal adversarial learning
Online Access:	https://ieeexplore.ieee.org/document/8715409/

id	doaj-e4134bf44d354e4aa15c03c747d1df3f
record_format	Article
spelling	doaj-e4134bf44d354e4aa15c03c747d1df3f2021-03-29T22:57:39ZengIEEEIEEE Access2169-35362019-01-017633736339410.1109/ACCESS.2019.29168878715409Deep Multimodal Representation Learning: A SurveyWenzhong Guo0Jianwen Wang1https://orcid.org/0000-0001-7603-1581Shiping Wang2College of Mathematics and Computer Sciences, Fuzhou University, Fuzhou, ChinaCollege of Mathematics and Computer Sciences, Fuzhou University, Fuzhou, ChinaCollege of Mathematics and Computer Sciences, Fuzhou University, Fuzhou, ChinaMultimodal representation learning, which aims to narrow the heterogeneity gap among different modalities, plays an indispensable role in the utilization of ubiquitous multimodal data. Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal representation learning has attracted much attention in recent years. In this paper, we provided a comprehensive survey on deep multimodal representation learning which has never been concentrated entirely. To facilitate the discussion on how the heterogeneity gap is narrowed, according to the underlying structures in which different modalities are integrated, we category deep multimodal representation learning methods into three frameworks: joint representation, coordinated representation, and encoder-decoder. Additionally, we review some typical models in this area ranging from conventional models to newly developed technologies. This paper highlights on the key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of our knowledge, have never been reviewed previously, even though they have become the major focuses of much contemporary research. For each framework or model, we discuss its basic structure, learning objective, application scenes, key issues, advantages, and disadvantages, such that both novel and experienced researchers can benefit from this survey. Finally, we suggest some important directions for future work.https://ieeexplore.ieee.org/document/8715409/Multimodal representation learningmultimodal deep learningdeep multimodal fusionmultimodal translationmultimodal adversarial learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Wenzhong Guo Jianwen Wang Shiping Wang
spellingShingle	Wenzhong Guo Jianwen Wang Shiping Wang Deep Multimodal Representation Learning: A Survey IEEE Access Multimodal representation learning multimodal deep learning deep multimodal fusion multimodal translation multimodal adversarial learning
author_facet	Wenzhong Guo Jianwen Wang Shiping Wang
author_sort	Wenzhong Guo
title	Deep Multimodal Representation Learning: A Survey
title_short	Deep Multimodal Representation Learning: A Survey
title_full	Deep Multimodal Representation Learning: A Survey
title_fullStr	Deep Multimodal Representation Learning: A Survey
title_full_unstemmed	Deep Multimodal Representation Learning: A Survey
title_sort	deep multimodal representation learning: a survey
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	Multimodal representation learning, which aims to narrow the heterogeneity gap among different modalities, plays an indispensable role in the utilization of ubiquitous multimodal data. Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal representation learning has attracted much attention in recent years. In this paper, we provided a comprehensive survey on deep multimodal representation learning which has never been concentrated entirely. To facilitate the discussion on how the heterogeneity gap is narrowed, according to the underlying structures in which different modalities are integrated, we category deep multimodal representation learning methods into three frameworks: joint representation, coordinated representation, and encoder-decoder. Additionally, we review some typical models in this area ranging from conventional models to newly developed technologies. This paper highlights on the key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning perspective, which, to the best of our knowledge, have never been reviewed previously, even though they have become the major focuses of much contemporary research. For each framework or model, we discuss its basic structure, learning objective, application scenes, key issues, advantages, and disadvantages, such that both novel and experienced researchers can benefit from this survey. Finally, we suggest some important directions for future work.
topic	Multimodal representation learning multimodal deep learning deep multimodal fusion multimodal translation multimodal adversarial learning
url	https://ieeexplore.ieee.org/document/8715409/
work_keys_str_mv	AT wenzhongguo deepmultimodalrepresentationlearningasurvey AT jianwenwang deepmultimodalrepresentationlearningasurvey AT shipingwang deepmultimodalrepresentationlearningasurvey
_version_	1724190489360465920

Deep Multimodal Representation Learning: A Survey

Similar Items