Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos

Foreground/background (fg/bg) classification is an important first step for several video analysis tasks such as people counting, activity recognition and anomaly detection. As is the case for several other Computer Vision problems, the advent of deep Convolutional Neural Network (CNN) methods has l...

Full description

Bibliographic Details
Main Authors: Ye Tao, Zhihao Ling, Ioannis Patras
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8888275/
id doaj-ed83bdb01b82448ea70838e7bfb9f543
record_format Article
spelling doaj-ed83bdb01b82448ea70838e7bfb9f5432021-03-30T00:19:15ZengIEEEIEEE Access2169-35362019-01-01715832615833710.1109/ACCESS.2019.29506398888275Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene VideosYe Tao0https://orcid.org/0000-0002-3954-828XZhihao Ling1Ioannis Patras2Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, ChinaKey Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, ChinaSchool of Electronic Engineering and Computer Science, Queen Mary University of London, London, U.K.Foreground/background (fg/bg) classification is an important first step for several video analysis tasks such as people counting, activity recognition and anomaly detection. As is the case for several other Computer Vision problems, the advent of deep Convolutional Neural Network (CNN) methods has led to major improvements in this field. However, despite their success, CNN-based methods have difficulties in coping with multi-scene videos where the scenes change multiple times along the time sequence. In this paper, we propose a deep features fusion network based foreground segmentation method (DFFnetSeg), which is both robust to scene changes and unseen scenes comparing with competitive state-of-the-art methods. In the heart of DFFnetSeg lies a fusion network that takes as input deep features extracted from a current frame, a previous frame, and a reference frame and produces as output a segmentation mask into background and foreground objects. We show the advantages of using a fusion network and the three frames group in dealing with the unseen scene and bootstrap challenge. In addition, we show that a simple reference frame updating strategy enables DFFnetSeg to be robust to sudden scene changes inside video sequences and prepare a motion map based post-processing method which further reduces false positives. Experimental results on the test dataset generated from CDnet2014 and Lasiesta demonstrate the advantages of the DFFnetSeg method.https://ieeexplore.ieee.org/document/8888275/Convolutional neural networkforeground segmentationmulti-scene videos aware
collection DOAJ
language English
format Article
sources DOAJ
author Ye Tao
Zhihao Ling
Ioannis Patras
spellingShingle Ye Tao
Zhihao Ling
Ioannis Patras
Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos
IEEE Access
Convolutional neural network
foreground segmentation
multi-scene videos aware
author_facet Ye Tao
Zhihao Ling
Ioannis Patras
author_sort Ye Tao
title Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos
title_short Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos
title_full Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos
title_fullStr Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos
title_full_unstemmed Universal Foreground Segmentation Based on Deep Feature Fusion Network for Multi-Scene Videos
title_sort universal foreground segmentation based on deep feature fusion network for multi-scene videos
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description Foreground/background (fg/bg) classification is an important first step for several video analysis tasks such as people counting, activity recognition and anomaly detection. As is the case for several other Computer Vision problems, the advent of deep Convolutional Neural Network (CNN) methods has led to major improvements in this field. However, despite their success, CNN-based methods have difficulties in coping with multi-scene videos where the scenes change multiple times along the time sequence. In this paper, we propose a deep features fusion network based foreground segmentation method (DFFnetSeg), which is both robust to scene changes and unseen scenes comparing with competitive state-of-the-art methods. In the heart of DFFnetSeg lies a fusion network that takes as input deep features extracted from a current frame, a previous frame, and a reference frame and produces as output a segmentation mask into background and foreground objects. We show the advantages of using a fusion network and the three frames group in dealing with the unseen scene and bootstrap challenge. In addition, we show that a simple reference frame updating strategy enables DFFnetSeg to be robust to sudden scene changes inside video sequences and prepare a motion map based post-processing method which further reduces false positives. Experimental results on the test dataset generated from CDnet2014 and Lasiesta demonstrate the advantages of the DFFnetSeg method.
topic Convolutional neural network
foreground segmentation
multi-scene videos aware
url https://ieeexplore.ieee.org/document/8888275/
work_keys_str_mv AT yetao universalforegroundsegmentationbasedondeepfeaturefusionnetworkformultiscenevideos
AT zhihaoling universalforegroundsegmentationbasedondeepfeaturefusionnetworkformultiscenevideos
AT ioannispatras universalforegroundsegmentationbasedondeepfeaturefusionnetworkformultiscenevideos
_version_ 1724188499392856064