Spatiotemporal Representation Learning for Video Anomaly Detection

Video-based anomalous human behavior detection is widely studied in many fields such as security, medical care, education, and energy. However, there are still some open problems in anomalous behavior detection, such as the large and complicated model is difficult to train, the accuracy of anomalous...

Full description

Bibliographic Details
Main Authors: Zhaoyan Li, Yaoshun Li, Zhisheng Gao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8976183/
id doaj-4b73403dccb14cc59c1a3be50fac3807
record_format Article
spelling doaj-4b73403dccb14cc59c1a3be50fac38072021-03-30T02:22:23ZengIEEEIEEE Access2169-35362020-01-018255312554210.1109/ACCESS.2020.29704978976183Spatiotemporal Representation Learning for Video Anomaly DetectionZhaoyan Li0Yaoshun Li1Zhisheng Gao2https://orcid.org/0000-0002-0470-8861School of Computer and Software Engineering, Xihua University, Chengdu, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu, ChinaSchool of Computer and Software Engineering, Xihua University, Chengdu, ChinaVideo-based anomalous human behavior detection is widely studied in many fields such as security, medical care, education, and energy. However, there are still some open problems in anomalous behavior detection, such as the large and complicated model is difficult to train, the accuracy of anomalous behavior detection is not high enough and the speed is not fast enough. A spatiotemporal representation learning model is proposed in this paper. Firstly, the spatial-temporal features of the video are extracted by the constructed multi-scale 3D convolutional neural network. Then the scene background is modeled by the high-dimensional mixed Gaussian model and used for anomaly detection. Finally, the accurate position of anomalous behavior in the video data is achieved by calculating the position of the last output feature, that is, the position of the receptive field. The proposed model does not require specific training. Moreover, the proposed method has the advantages of high versatility, fast calculation speed and high detection accuracy. We validated the proposed algorithm on two representative surveillance scene datasets, the Subway and the UCSDSped2. Results show that proposed algorithm has achieved the detection rate of 18 FPS under the condition of common computing resources, and meet the real-time requirements. Moreover, compared the similar methods, the proposed method has achieved the competitive results in both frame-level accuracy and pixel-level accuracy.https://ieeexplore.ieee.org/document/8976183/Spatiotemporal representation learninganomaly detection3D convolutional neural networkmixed Gaussian model
collection DOAJ
language English
format Article
sources DOAJ
author Zhaoyan Li
Yaoshun Li
Zhisheng Gao
spellingShingle Zhaoyan Li
Yaoshun Li
Zhisheng Gao
Spatiotemporal Representation Learning for Video Anomaly Detection
IEEE Access
Spatiotemporal representation learning
anomaly detection
3D convolutional neural network
mixed Gaussian model
author_facet Zhaoyan Li
Yaoshun Li
Zhisheng Gao
author_sort Zhaoyan Li
title Spatiotemporal Representation Learning for Video Anomaly Detection
title_short Spatiotemporal Representation Learning for Video Anomaly Detection
title_full Spatiotemporal Representation Learning for Video Anomaly Detection
title_fullStr Spatiotemporal Representation Learning for Video Anomaly Detection
title_full_unstemmed Spatiotemporal Representation Learning for Video Anomaly Detection
title_sort spatiotemporal representation learning for video anomaly detection
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Video-based anomalous human behavior detection is widely studied in many fields such as security, medical care, education, and energy. However, there are still some open problems in anomalous behavior detection, such as the large and complicated model is difficult to train, the accuracy of anomalous behavior detection is not high enough and the speed is not fast enough. A spatiotemporal representation learning model is proposed in this paper. Firstly, the spatial-temporal features of the video are extracted by the constructed multi-scale 3D convolutional neural network. Then the scene background is modeled by the high-dimensional mixed Gaussian model and used for anomaly detection. Finally, the accurate position of anomalous behavior in the video data is achieved by calculating the position of the last output feature, that is, the position of the receptive field. The proposed model does not require specific training. Moreover, the proposed method has the advantages of high versatility, fast calculation speed and high detection accuracy. We validated the proposed algorithm on two representative surveillance scene datasets, the Subway and the UCSDSped2. Results show that proposed algorithm has achieved the detection rate of 18 FPS under the condition of common computing resources, and meet the real-time requirements. Moreover, compared the similar methods, the proposed method has achieved the competitive results in both frame-level accuracy and pixel-level accuracy.
topic Spatiotemporal representation learning
anomaly detection
3D convolutional neural network
mixed Gaussian model
url https://ieeexplore.ieee.org/document/8976183/
work_keys_str_mv AT zhaoyanli spatiotemporalrepresentationlearningforvideoanomalydetection
AT yaoshunli spatiotemporalrepresentationlearningforvideoanomalydetection
AT zhishenggao spatiotemporalrepresentationlearningforvideoanomalydetection
_version_ 1724185279766462464