Multi-Modality Global Fusion Attention Network for Visual Question Answering

Visual question answering (VQA) requires a high-level understanding of both questions and images, along with visual reasoning to predict the correct answer. Therefore, it is important to design an effective attention model to associate key regions in an image with key words in a question. Up to now,...

Full description

Bibliographic Details
Main Authors: Cheng Yang, Weijia Wu, Yuxing Wang, Hong Zhou
Format: Article
Language:English
Published: MDPI AG 2020-11-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/9/11/1882