Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach

In the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the appl...

Full description

Bibliographic Details
Main Authors:	Qianxia Ma, Yongfang Nie, Jingyan Song, Tao Zhang
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Attention mechanism convolutional neural network smart city multimodal machine learning
Online Access:	https://ieeexplore.ieee.org/document/9274421/

id	doaj-bd610d1b984f4eb2b5f05a88e72680cd
record_format	Article
spelling	doaj-bd610d1b984f4eb2b5f05a88e72680cd2021-03-30T04:02:51ZengIEEEIEEE Access2169-35362020-01-01821550521551510.1109/ACCESS.2020.30414479274421Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning ApproachQianxia Ma0https://orcid.org/0000-0002-6635-2792Yongfang Nie1https://orcid.org/0000-0002-8718-3422Jingyan Song2https://orcid.org/0000-0001-9790-8573Tao Zhang3Department of Automation, Tsinghua University, Beijing, ChinaDepartment of Strategic Missile and Underwater Weapon, Naval Submarine Academy, Qingdao, ChinaDepartment of Automation, Tsinghua University, Beijing, ChinaDepartment of Automation, Tsinghua University, Beijing, ChinaIn the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the application of edge computing. Different types of end devices generate and collect multimodal information, and substantial data is transmitted to upper nodes. Multimodal machine learning methods process data at edge nodes, and only high-level features are uploaded to the cloud in order to save bandwidth. In this article, we propose a novel multimodal data processing framework based on multiple attention mechanisms. Two distinct attention mechanisms are used to capture inter and intra-modality dependencies and align different modalities together. We conduct experiments on image captioning, a core research hotspot in multimodal machine learning. A unified hierarchical structure extracts features from images and natural language. Matching attention aligns visual and textual information. Besides, we propose a new attention mechanism, positional attention, which finds the relationship of each element within one sensory modality. The hierarchical structure realizes parallel computation in the training phase and speeds up the training of the model. Experiments and analysis demonstrate significant improvements over baselines, proving the effectiveness of our method.https://ieeexplore.ieee.org/document/9274421/Attention mechanismconvolutional neural networksmart citymultimodal machine learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Qianxia Ma Yongfang Nie Jingyan Song Tao Zhang
spellingShingle	Qianxia Ma Yongfang Nie Jingyan Song Tao Zhang Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach IEEE Access Attention mechanism convolutional neural network smart city multimodal machine learning
author_facet	Qianxia Ma Yongfang Nie Jingyan Song Tao Zhang
author_sort	Qianxia Ma
title	Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_short	Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_full	Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_fullStr	Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_full_unstemmed	Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach
title_sort	multimodal data processing framework for smart city: a positional-attention based deep learning approach
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	In the past few years, edge computing has brought tremendous convenience to the development of smart cities, releasing computation pressure to edge compute nodes. However, a series of problems, such as the explosive growth of smart devices and limited spectrum resources, still greatly limit the application of edge computing. Different types of end devices generate and collect multimodal information, and substantial data is transmitted to upper nodes. Multimodal machine learning methods process data at edge nodes, and only high-level features are uploaded to the cloud in order to save bandwidth. In this article, we propose a novel multimodal data processing framework based on multiple attention mechanisms. Two distinct attention mechanisms are used to capture inter and intra-modality dependencies and align different modalities together. We conduct experiments on image captioning, a core research hotspot in multimodal machine learning. A unified hierarchical structure extracts features from images and natural language. Matching attention aligns visual and textual information. Besides, we propose a new attention mechanism, positional attention, which finds the relationship of each element within one sensory modality. The hierarchical structure realizes parallel computation in the training phase and speeds up the training of the model. Experiments and analysis demonstrate significant improvements over baselines, proving the effectiveness of our method.
topic	Attention mechanism convolutional neural network smart city multimodal machine learning
url	https://ieeexplore.ieee.org/document/9274421/
work_keys_str_mv	AT qianxiama multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach AT yongfangnie multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach AT jingyansong multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach AT taozhang multimodaldataprocessingframeworkforsmartcityapositionalattentionbaseddeeplearningapproach
_version_	1724182450182029312

Multimodal Data Processing Framework for Smart City: A Positional-Attention Based Deep Learning Approach

Similar Items