Attention Embedded Spatio-Temporal Network for Video Salient Object Detection

The main challenge in video salient object detection is how to model object motion and dramatic changes in appearance contrast. In this work, we propose an attention embedded spatio-temporal network (ASTN) to adaptively exploit diverse factors that influence dynamic saliency prediction within a unif...

Full description

Bibliographic Details
Main Authors:	Lili Huang, Pengxiang Yan, Guanbin Li, Qing Wang, Liang Lin
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Video salient object detection spatiotemporal modeling deep learning representation learning
Online Access:	https://ieeexplore.ieee.org/document/8896915/

id	doaj-b6319588ba61494eb33f4c27dcad8618
record_format	Article
spelling	doaj-b6319588ba61494eb33f4c27dcad86182021-05-19T23:01:33ZengIEEEIEEE Access2169-35362019-01-01716620316621310.1109/ACCESS.2019.29530468896915Attention Embedded Spatio-Temporal Network for Video Salient Object DetectionLili Huang0https://orcid.org/0000-0001-7813-6539Pengxiang Yan1https://orcid.org/0000-0002-3075-2903Guanbin Li2Qing Wang3Liang Lin4School of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaSchool of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaSchool of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaSchool of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaSchool of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaThe main challenge in video salient object detection is how to model object motion and dramatic changes in appearance contrast. In this work, we propose an attention embedded spatio-temporal network (ASTN) to adaptively exploit diverse factors that influence dynamic saliency prediction within a unified framework. To compensate for object movement, we introduce a flow-guided spatial learning (FGSL) module to directly capture effective motion information in the form of attention based on optical flows. However, optical flow represents the motion information of all moving objects, including movements of non-salient objects caused by large camera motion and subtle changes in background. Therefore, using the flow-guided attention map alone causes the spatial saliency to be influenced by all moving objects rather than just the salient objects, resulting in unstable and temporally inconsistent saliency maps. To further enhance the temporal coherence, we develop an attentive bidirectional gated recurrent unit (AB-GRU) module to adaptively exploit sequential feature evolution. With this AB-GRU, we can further refine the spatiotemporal feature representation by incorporating an accommodative attention mechanism. Experimental results demonstrate that our model achieves superior empirical performance on video salient object detection. Moreover, an experiment on the extended application to unsupervised video object segmentation further demonstrates the generalization ability and stability of our proposed method.https://ieeexplore.ieee.org/document/8896915/Video salient object detectionspatiotemporal modelingdeep learningrepresentation learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Lili Huang Pengxiang Yan Guanbin Li Qing Wang Liang Lin
spellingShingle	Lili Huang Pengxiang Yan Guanbin Li Qing Wang Liang Lin Attention Embedded Spatio-Temporal Network for Video Salient Object Detection IEEE Access Video salient object detection spatiotemporal modeling deep learning representation learning
author_facet	Lili Huang Pengxiang Yan Guanbin Li Qing Wang Liang Lin
author_sort	Lili Huang
title	Attention Embedded Spatio-Temporal Network for Video Salient Object Detection
title_short	Attention Embedded Spatio-Temporal Network for Video Salient Object Detection
title_full	Attention Embedded Spatio-Temporal Network for Video Salient Object Detection
title_fullStr	Attention Embedded Spatio-Temporal Network for Video Salient Object Detection
title_full_unstemmed	Attention Embedded Spatio-Temporal Network for Video Salient Object Detection
title_sort	attention embedded spatio-temporal network for video salient object detection
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	The main challenge in video salient object detection is how to model object motion and dramatic changes in appearance contrast. In this work, we propose an attention embedded spatio-temporal network (ASTN) to adaptively exploit diverse factors that influence dynamic saliency prediction within a unified framework. To compensate for object movement, we introduce a flow-guided spatial learning (FGSL) module to directly capture effective motion information in the form of attention based on optical flows. However, optical flow represents the motion information of all moving objects, including movements of non-salient objects caused by large camera motion and subtle changes in background. Therefore, using the flow-guided attention map alone causes the spatial saliency to be influenced by all moving objects rather than just the salient objects, resulting in unstable and temporally inconsistent saliency maps. To further enhance the temporal coherence, we develop an attentive bidirectional gated recurrent unit (AB-GRU) module to adaptively exploit sequential feature evolution. With this AB-GRU, we can further refine the spatiotemporal feature representation by incorporating an accommodative attention mechanism. Experimental results demonstrate that our model achieves superior empirical performance on video salient object detection. Moreover, an experiment on the extended application to unsupervised video object segmentation further demonstrates the generalization ability and stability of our proposed method.
topic	Video salient object detection spatiotemporal modeling deep learning representation learning
url	https://ieeexplore.ieee.org/document/8896915/
work_keys_str_mv	AT lilihuang attentionembeddedspatiotemporalnetworkforvideosalientobjectdetection AT pengxiangyan attentionembeddedspatiotemporalnetworkforvideosalientobjectdetection AT guanbinli attentionembeddedspatiotemporalnetworkforvideosalientobjectdetection AT qingwang attentionembeddedspatiotemporalnetworkforvideosalientobjectdetection AT lianglin attentionembeddedspatiotemporalnetworkforvideosalientobjectdetection
_version_	1721436257803829248

Attention Embedded Spatio-Temporal Network for Video Salient Object Detection

Similar Items