Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach

In the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the appl...

Full description

Bibliographic Details
Main Authors: Qianxia Ma, Yongfang Nie, Jingyan Song, Tao Zhang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9274421/
id doaj-bd610d1b984f4eb2b5f05a88e72680cd
record_format Article
spelling doaj-bd610d1b984f4eb2b5f05a88e72680cd2021-03-30T04:02:51ZengIEEEIEEE Access2169-35362020-01-01821550521551510.1109/ACCESS.2020.30414479274421Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning ApproachQianxia Ma0https://orcid.org/0000-0002-6635-2792Yongfang Nie1https://orcid.org/0000-0002-8718-3422Jingyan Song2https://orcid.org/0000-0001-9790-8573Tao Zhang3Department of Automation, Tsinghua University, Beijing, ChinaDepartment of Strategic Missile and Underwater Weapon, Naval Submarine Academy, Qingdao, ChinaDepartment of Automation, Tsinghua University, Beijing, ChinaDepartment of Automation, Tsinghua University, Beijing, ChinaIn the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the application of edge computing. Different types of end devices generate and collect multimodal information, and substantial data is transmitted to upper nodes. Multimodal machine learning methods process data at edge nodes, and only high-level features are uploaded to the cloud in order to save bandwidth. In this article, we propose a novel multimodal data processing framework based on multiple attention mechanisms. Two distinct attention mechanisms are used to capture inter and intra-modality dependencies and align different modalities together. We conduct experiments on image captioning, a core research hotspot in multimodal machine learning. A unified hierarchical structure extracts features from images and natural language. Matching attention aligns visual and textual information. Besides, we propose a new attention mechanism, positional attention, which finds the relationship of each element within one sensory modality. The hierarchical structure realizes parallel computation in the training phase and speeds up the training of the model. Experiments and analysis demonstrate significant improvements over baselines, proving the effectiveness of our method.https://ieeexplore.ieee.org/document/9274421/Attention mechanismconvolutional neural networksmart citymultimodal machine learning
collection DOAJ
language English
format Article
sources DOAJ
author Qianxia Ma
Yongfang Nie
Jingyan Song
Tao Zhang
spellingShingle Qianxia Ma
Yongfang Nie
Jingyan Song
Tao Zhang
Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
IEEE Access
Attention mechanism
convolutional neural network
smart city
multimodal machine learning
author_facet Qianxia Ma
Yongfang Nie
Jingyan Song
Tao Zhang
author_sort Qianxia Ma
title Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_short Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_full Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_fullStr Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_full_unstemmed Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_sort multimodal data processing framework for smart city: a positional-attention based deep learning approach
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description In the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the application of edge computing. Different types of end devices generate and collect multimodal information, and substantial data is transmitted to upper nodes. Multimodal machine learning methods process data at edge nodes, and only high-level features are uploaded to the cloud in order to save bandwidth. In this article, we propose a novel multimodal data processing framework based on multiple attention mechanisms. Two distinct attention mechanisms are used to capture inter and intra-modality dependencies and align different modalities together. We conduct experiments on image captioning, a core research hotspot in multimodal machine learning. A unified hierarchical structure extracts features from images and natural language. Matching attention aligns visual and textual information. Besides, we propose a new attention mechanism, positional attention, which finds the relationship of each element within one sensory modality. The hierarchical structure realizes parallel computation in the training phase and speeds up the training of the model. Experiments and analysis demonstrate significant improvements over baselines, proving the effectiveness of our method.
topic Attention mechanism
convolutional neural network
smart city
multimodal machine learning
url https://ieeexplore.ieee.org/document/9274421/
work_keys_str_mv AT qianxiama multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach
AT yongfangnie multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach
AT jingyansong multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach
AT taozhang multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach
_version_ 1724182450182029312