Learning Cross-Modal Aligned Representation With Graph Embedding
The main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventional approaches tend to construct and optimize a linear projection or train a complex architecture of deep layers, yet it is difficul...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2018-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8543794/ |
id |
doaj-f4a934c957fe480d8d20b1d0c5139ded |
---|---|
record_format |
Article |
spelling |
doaj-f4a934c957fe480d8d20b1d0c5139ded2021-03-29T21:30:07ZengIEEEIEEE Access2169-35362018-01-016773217733310.1109/ACCESS.2018.28819978543794Learning Cross-Modal Aligned Representation With Graph EmbeddingYoucai Zhang0https://orcid.org/0000-0003-1412-2677Jiayan Cao1Xiaodong Gu2https://orcid.org/0000-0002-7096-1830Department of Electronic Engineering, Fudan University, Shanghai, ChinaDepartment of Electronic Engineering, Fudan University, Shanghai, ChinaDepartment of Electronic Engineering, Fudan University, Shanghai, ChinaThe main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventional approaches tend to construct and optimize a linear projection or train a complex architecture of deep layers, yet it is difficult to compromise between accuracy and efficiency on modeling multimodal data. This paper proposes a novel graph-embedding learning framework implemented by neural networks. The learned embedding directly approximates the cross-modal aligned representation to perform cross-modal retrieval and image classification combining text information. Proposed framework extracts learned representation from a graph model and, simultaneously, trains a classifier under semi-supervised settings. For optimization, unlike previous methods based on the graph Laplacian regularization, a sampling strategy is adopted to generate training pairs to fully explore the inter-modal and intra-modal similarity relationship. Experimental results on various datasets show that the proposed framework outperforms other state-of-the-art methods on crossmodal retrieval. The framework also demonstrates convincing improvements on the new issue of image classification combining text information on Wiki dataset.https://ieeexplore.ieee.org/document/8543794/Graph embedding learningcross-modal retrievalneural networksemi-supervised learning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Youcai Zhang Jiayan Cao Xiaodong Gu |
spellingShingle |
Youcai Zhang Jiayan Cao Xiaodong Gu Learning Cross-Modal Aligned Representation With Graph Embedding IEEE Access Graph embedding learning cross-modal retrieval neural network semi-supervised learning |
author_facet |
Youcai Zhang Jiayan Cao Xiaodong Gu |
author_sort |
Youcai Zhang |
title |
Learning Cross-Modal Aligned Representation With Graph Embedding |
title_short |
Learning Cross-Modal Aligned Representation With Graph Embedding |
title_full |
Learning Cross-Modal Aligned Representation With Graph Embedding |
title_fullStr |
Learning Cross-Modal Aligned Representation With Graph Embedding |
title_full_unstemmed |
Learning Cross-Modal Aligned Representation With Graph Embedding |
title_sort |
learning cross-modal aligned representation with graph embedding |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2018-01-01 |
description |
The main task of cross-modal analysis is to learn discriminative representation shared across different modalities. In order to pursue aligned representation, conventional approaches tend to construct and optimize a linear projection or train a complex architecture of deep layers, yet it is difficult to compromise between accuracy and efficiency on modeling multimodal data. This paper proposes a novel graph-embedding learning framework implemented by neural networks. The learned embedding directly approximates the cross-modal aligned representation to perform cross-modal retrieval and image classification combining text information. Proposed framework extracts learned representation from a graph model and, simultaneously, trains a classifier under semi-supervised settings. For optimization, unlike previous methods based on the graph Laplacian regularization, a sampling strategy is adopted to generate training pairs to fully explore the inter-modal and intra-modal similarity relationship. Experimental results on various datasets show that the proposed framework outperforms other state-of-the-art methods on crossmodal retrieval. The framework also demonstrates convincing improvements on the new issue of image classification combining text information on Wiki dataset. |
topic |
Graph embedding learning cross-modal retrieval neural network semi-supervised learning |
url |
https://ieeexplore.ieee.org/document/8543794/ |
work_keys_str_mv |
AT youcaizhang learningcrossmodalalignedrepresentationwithgraphembedding AT jiayancao learningcrossmodalalignedrepresentationwithgraphembedding AT xiaodonggu learningcrossmodalalignedrepresentationwithgraphembedding |
_version_ |
1724192822560555008 |