Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach

In the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the appl...

Full description

Bibliographic Details
Main Authors: Qianxia Ma, Yongfang Nie, Jingyan Song, Tao Zhang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9274421/
Description
Summary:In the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the application of edge computing. Different types of end devices generate and collect multimodal information, and substantial data is transmitted to upper nodes. Multimodal machine learning methods process data at edge nodes, and only high-level features are uploaded to the cloud in order to save bandwidth. In this article, we propose a novel multimodal data processing framework based on multiple attention mechanisms. Two distinct attention mechanisms are used to capture inter and intra-modality dependencies and align different modalities together. We conduct experiments on image captioning, a core research hotspot in multimodal machine learning. A unified hierarchical structure extracts features from images and natural language. Matching attention aligns visual and textual information. Besides, we propose a new attention mechanism, positional attention, which finds the relationship of each element within one sensory modality. The hierarchical structure realizes parallel computation in the training phase and speeds up the training of the model. Experiments and analysis demonstrate significant improvements over baselines, proving the effectiveness of our method.
ISSN:2169-3536