Multimodal Encoder-Decoder Attention Networks for Visual Question Answering

Visual Question Answering (VQA) is a multimodal task involving Computer Vision (CV) and Natural Language Processing (NLP), the goal is to establish a high-efficiency VQA model. Learning a fine-grained and simultaneous understanding of both the visual content of images and the textual content of ques...

Full description

Bibliographic Details
Main Authors:	Chongqing Chen, Dezhi Han, Jun Wang
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Computer vision encoder-decoder attention multimodal task natural language processing question-guided-attention self-attention
Online Access:	https://ieeexplore.ieee.org/document/9003229/

Internet

https://ieeexplore.ieee.org/document/9003229/

Multimodal Encoder-Decoder Attention Networks for Visual Question Answering

Internet

Similar Items