Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution

Predicting activity motion form video is of great importance with multiple applications in computer vision. From the self-driving cars field to the health system, the earlier the anticipation the higher the classification probability success. The main challenge of prediction is accurate information...

Full description

Bibliographic Details
Main Authors: Keyang Cheng, Eric Kasangu Lubamba, Qing Liu
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9139509/
id doaj-6c56d7a619764b6dbe6c6537714ed433
record_format Article
spelling doaj-6c56d7a619764b6dbe6c6537714ed4332021-03-30T03:24:39ZengIEEEIEEE Access2169-35362020-01-01813352713354010.1109/ACCESS.2020.30088489139509Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable ConvolutionKeyang Cheng0https://orcid.org/0000-0001-5240-1605Eric Kasangu Lubamba1https://orcid.org/0000-0002-7478-4077Qing Liu2https://orcid.org/0000-0002-3546-9832School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, ChinaNational Engineering Laboratory for Public Safety Risk Perception and Control by Big Data, Beijing, ChinaSchool of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, ChinaPredicting activity motion form video is of great importance with multiple applications in computer vision. From the self-driving cars field to the health system, the earlier the anticipation the higher the classification probability success. The main challenge of prediction is accurate information of the object of interest in the frame as compared to the full-frame, from the partial observation. To this end, we propose an end-to-end two-stage architecture model that leverages pixel-level features awareness of spatiotemporal information of the object of interest. The first stage of our model is a classification block composed of 3 blocks layers: a background subtraction layer that enables the model to focus on the subject of interest followed by Deformable Convolution layers for feature extraction and finally an additive Softmax for the final classification. Learned information from the first stage is then transferred to the second stage composed of Long Short-Term Memory layers and a final loss function for prediction. The pervasive evaluation on the UT-Interaction, the HMDB51 as well as on the UCF-Sports benchmarks show the betterment of our model performance over threshold probability difference as compared to other solutions. And demonstrate an early action prediction at a lower observation ratio.https://ieeexplore.ieee.org/document/9139509/Background subtractiondeformable convolutionsequential recurrent networkaction classificationaction prediction
collection DOAJ
language English
format Article
sources DOAJ
author Keyang Cheng
Eric Kasangu Lubamba
Qing Liu
spellingShingle Keyang Cheng
Eric Kasangu Lubamba
Qing Liu
Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution
IEEE Access
Background subtraction
deformable convolution
sequential recurrent network
action classification
action prediction
author_facet Keyang Cheng
Eric Kasangu Lubamba
Qing Liu
author_sort Keyang Cheng
title Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution
title_short Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution
title_full Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution
title_fullStr Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution
title_full_unstemmed Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution
title_sort action prediction based on partial video observation via context and temporal sequential network with deformable convolution
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Predicting activity motion form video is of great importance with multiple applications in computer vision. From the self-driving cars field to the health system, the earlier the anticipation the higher the classification probability success. The main challenge of prediction is accurate information of the object of interest in the frame as compared to the full-frame, from the partial observation. To this end, we propose an end-to-end two-stage architecture model that leverages pixel-level features awareness of spatiotemporal information of the object of interest. The first stage of our model is a classification block composed of 3 blocks layers: a background subtraction layer that enables the model to focus on the subject of interest followed by Deformable Convolution layers for feature extraction and finally an additive Softmax for the final classification. Learned information from the first stage is then transferred to the second stage composed of Long Short-Term Memory layers and a final loss function for prediction. The pervasive evaluation on the UT-Interaction, the HMDB51 as well as on the UCF-Sports benchmarks show the betterment of our model performance over threshold probability difference as compared to other solutions. And demonstrate an early action prediction at a lower observation ratio.
topic Background subtraction
deformable convolution
sequential recurrent network
action classification
action prediction
url https://ieeexplore.ieee.org/document/9139509/
work_keys_str_mv AT keyangcheng actionpredictionbasedonpartialvideoobservationviacontextandtemporalsequentialnetworkwithdeformableconvolution
AT erickasangulubamba actionpredictionbasedonpartialvideoobservationviacontextandtemporalsequentialnetworkwithdeformableconvolution
AT qingliu actionpredictionbasedonpartialvideoobservationviacontextandtemporalsequentialnetworkwithdeformableconvolution
_version_ 1724183522984329216