A novel multi-stream method for violent interaction detection using deep learning

Violent interaction detection is a hot topic in computer vision. However, the recent research works on violent interaction detection mainly focus on the traditional hand-craft features, and does not make full use of the research results of deep learning in computer vision. In this paper, we propose...

Full description

Bibliographic Details
Main Authors: Hongchang Li, Jing Wang, Jianjun Han, Jinmin Zhang, Yushan Yang, Yue Zhao
Format: Article
Language:English
Published: SAGE Publishing 2020-05-01
Series:Measurement + Control
Online Access:https://doi.org/10.1177/0020294020902788
id doaj-491ff4dddddb450a9c4a6642aa5f1561
record_format Article
spelling doaj-491ff4dddddb450a9c4a6642aa5f15612020-11-25T03:51:43ZengSAGE PublishingMeasurement + Control0020-29402020-05-015310.1177/0020294020902788A novel multi-stream method for violent interaction detection using deep learningHongchang LiJing WangJianjun HanJinmin ZhangYushan YangYue ZhaoViolent interaction detection is a hot topic in computer vision. However, the recent research works on violent interaction detection mainly focus on the traditional hand-craft features, and does not make full use of the research results of deep learning in computer vision. In this paper, we propose a new robust violent interaction detection framework based on multi-stream deep learning in surveillance scene. The proposed approach enhances the recognition performance of violent action in video by fusing three different streams: attention-based spatial RGB stream, temporal stream, and local spatial stream. The attention-based spatial RGB stream learns the spatial attention regions of persons that have high probability to be action region through soft-attention mechanism. The temporal stream employs optical flow as input to extract temporal features. The local spatial stream learns spatial local features using block images as input. Experimental results demonstrate the effectiveness and reliability of the proposed method on three violent interactive datasets: hockey fights, movies, violent interaction. We also verify the proposed method on our own elevator surveillance video dataset and the performance of the proposed method is satisfied.https://doi.org/10.1177/0020294020902788
collection DOAJ
language English
format Article
sources DOAJ
author Hongchang Li
Jing Wang
Jianjun Han
Jinmin Zhang
Yushan Yang
Yue Zhao
spellingShingle Hongchang Li
Jing Wang
Jianjun Han
Jinmin Zhang
Yushan Yang
Yue Zhao
A novel multi-stream method for violent interaction detection using deep learning
Measurement + Control
author_facet Hongchang Li
Jing Wang
Jianjun Han
Jinmin Zhang
Yushan Yang
Yue Zhao
author_sort Hongchang Li
title A novel multi-stream method for violent interaction detection using deep learning
title_short A novel multi-stream method for violent interaction detection using deep learning
title_full A novel multi-stream method for violent interaction detection using deep learning
title_fullStr A novel multi-stream method for violent interaction detection using deep learning
title_full_unstemmed A novel multi-stream method for violent interaction detection using deep learning
title_sort novel multi-stream method for violent interaction detection using deep learning
publisher SAGE Publishing
series Measurement + Control
issn 0020-2940
publishDate 2020-05-01
description Violent interaction detection is a hot topic in computer vision. However, the recent research works on violent interaction detection mainly focus on the traditional hand-craft features, and does not make full use of the research results of deep learning in computer vision. In this paper, we propose a new robust violent interaction detection framework based on multi-stream deep learning in surveillance scene. The proposed approach enhances the recognition performance of violent action in video by fusing three different streams: attention-based spatial RGB stream, temporal stream, and local spatial stream. The attention-based spatial RGB stream learns the spatial attention regions of persons that have high probability to be action region through soft-attention mechanism. The temporal stream employs optical flow as input to extract temporal features. The local spatial stream learns spatial local features using block images as input. Experimental results demonstrate the effectiveness and reliability of the proposed method on three violent interactive datasets: hockey fights, movies, violent interaction. We also verify the proposed method on our own elevator surveillance video dataset and the performance of the proposed method is satisfied.
url https://doi.org/10.1177/0020294020902788
work_keys_str_mv AT hongchangli anovelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT jingwang anovelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT jianjunhan anovelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT jinminzhang anovelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT yushanyang anovelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT yuezhao anovelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT hongchangli novelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT jingwang novelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT jianjunhan novelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT jinminzhang novelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT yushanyang novelmultistreammethodforviolentinteractiondetectionusingdeeplearning
AT yuezhao novelmultistreammethodforviolentinteractiondetectionusingdeeplearning
_version_ 1724486051057106944