Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
In the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the appl...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9274421/ |
id |
doaj-bd610d1b984f4eb2b5f05a88e72680cd |
---|---|
record_format |
Article |
spelling |
doaj-bd610d1b984f4eb2b5f05a88e72680cd2021-03-30T04:02:51ZengIEEEIEEE Access2169-35362020-01-01821550521551510.1109/ACCESS.2020.30414479274421Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning ApproachQianxia Ma0https://orcid.org/0000-0002-6635-2792Yongfang Nie1https://orcid.org/0000-0002-8718-3422Jingyan Song2https://orcid.org/0000-0001-9790-8573Tao Zhang3Department of Automation, Tsinghua University, Beijing, ChinaDepartment of Strategic Missile and Underwater Weapon, Naval Submarine Academy, Qingdao, ChinaDepartment of Automation, Tsinghua University, Beijing, ChinaDepartment of Automation, Tsinghua University, Beijing, ChinaIn the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the application of edge computing. Different types of end devices generate and collect multimodal information, and substantial data is transmitted to upper nodes. Multimodal machine learning methods process data at edge nodes, and only high-level features are uploaded to the cloud in order to save bandwidth. In this article, we propose a novel multimodal data processing framework based on multiple attention mechanisms. Two distinct attention mechanisms are used to capture inter and intra-modality dependencies and align different modalities together. We conduct experiments on image captioning, a core research hotspot in multimodal machine learning. A unified hierarchical structure extracts features from images and natural language. Matching attention aligns visual and textual information. Besides, we propose a new attention mechanism, positional attention, which finds the relationship of each element within one sensory modality. The hierarchical structure realizes parallel computation in the training phase and speeds up the training of the model. Experiments and analysis demonstrate significant improvements over baselines, proving the effectiveness of our method.https://ieeexplore.ieee.org/document/9274421/Attention mechanismconvolutional neural networksmart citymultimodal machine learning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Qianxia Ma Yongfang Nie Jingyan Song Tao Zhang |
spellingShingle |
Qianxia Ma Yongfang Nie Jingyan Song Tao Zhang Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach IEEE Access Attention mechanism convolutional neural network smart city multimodal machine learning |
author_facet |
Qianxia Ma Yongfang Nie Jingyan Song Tao Zhang |
author_sort |
Qianxia Ma |
title |
Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach |
title_short |
Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach |
title_full |
Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach |
title_fullStr |
Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach |
title_full_unstemmed |
Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach |
title_sort |
multimodal data processing framework for smart city: a positional-attention based deep learning approach |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
In the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the application of edge computing. Different types of end devices generate and collect multimodal information, and substantial data is transmitted to upper nodes. Multimodal machine learning methods process data at edge nodes, and only high-level features are uploaded to the cloud in order to save bandwidth. In this article, we propose a novel multimodal data processing framework based on multiple attention mechanisms. Two distinct attention mechanisms are used to capture inter and intra-modality dependencies and align different modalities together. We conduct experiments on image captioning, a core research hotspot in multimodal machine learning. A unified hierarchical structure extracts features from images and natural language. Matching attention aligns visual and textual information. Besides, we propose a new attention mechanism, positional attention, which finds the relationship of each element within one sensory modality. The hierarchical structure realizes parallel computation in the training phase and speeds up the training of the model. Experiments and analysis demonstrate significant improvements over baselines, proving the effectiveness of our method. |
topic |
Attention mechanism convolutional neural network smart city multimodal machine learning |
url |
https://ieeexplore.ieee.org/document/9274421/ |
work_keys_str_mv |
AT qianxiama multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach AT yongfangnie multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach AT jingyansong multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach AT taozhang multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach |
_version_ |
1724182450182029312 |