Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution
Predicting activity motion form video is of great importance with multiple applications in computer vision. From the self-driving cars field to the health system, the earlier the anticipation the higher the classification probability success. The main challenge of prediction is accurate information...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9139509/ |
id |
doaj-6c56d7a619764b6dbe6c6537714ed433 |
---|---|
record_format |
Article |
spelling |
doaj-6c56d7a619764b6dbe6c6537714ed4332021-03-30T03:24:39ZengIEEEIEEE Access2169-35362020-01-01813352713354010.1109/ACCESS.2020.30088489139509Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable ConvolutionKeyang Cheng0https://orcid.org/0000-0001-5240-1605Eric Kasangu Lubamba1https://orcid.org/0000-0002-7478-4077Qing Liu2https://orcid.org/0000-0002-3546-9832School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, ChinaNational Engineering Laboratory for Public Safety Risk Perception and Control by Big Data, Beijing, ChinaSchool of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, ChinaPredicting activity motion form video is of great importance with multiple applications in computer vision. From the self-driving cars field to the health system, the earlier the anticipation the higher the classification probability success. The main challenge of prediction is accurate information of the object of interest in the frame as compared to the full-frame, from the partial observation. To this end, we propose an end-to-end two-stage architecture model that leverages pixel-level features awareness of spatiotemporal information of the object of interest. The first stage of our model is a classification block composed of 3 blocks layers: a background subtraction layer that enables the model to focus on the subject of interest followed by Deformable Convolution layers for feature extraction and finally an additive Softmax for the final classification. Learned information from the first stage is then transferred to the second stage composed of Long Short-Term Memory layers and a final loss function for prediction. The pervasive evaluation on the UT-Interaction, the HMDB51 as well as on the UCF-Sports benchmarks show the betterment of our model performance over threshold probability difference as compared to other solutions. And demonstrate an early action prediction at a lower observation ratio.https://ieeexplore.ieee.org/document/9139509/Background subtractiondeformable convolutionsequential recurrent networkaction classificationaction prediction |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Keyang Cheng Eric Kasangu Lubamba Qing Liu |
spellingShingle |
Keyang Cheng Eric Kasangu Lubamba Qing Liu Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution IEEE Access Background subtraction deformable convolution sequential recurrent network action classification action prediction |
author_facet |
Keyang Cheng Eric Kasangu Lubamba Qing Liu |
author_sort |
Keyang Cheng |
title |
Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution |
title_short |
Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution |
title_full |
Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution |
title_fullStr |
Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution |
title_full_unstemmed |
Action Prediction Based on Partial Video Observation via Context and Temporal Sequential Network With Deformable Convolution |
title_sort |
action prediction based on partial video observation via context and temporal sequential network with deformable convolution |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Predicting activity motion form video is of great importance with multiple applications in computer vision. From the self-driving cars field to the health system, the earlier the anticipation the higher the classification probability success. The main challenge of prediction is accurate information of the object of interest in the frame as compared to the full-frame, from the partial observation. To this end, we propose an end-to-end two-stage architecture model that leverages pixel-level features awareness of spatiotemporal information of the object of interest. The first stage of our model is a classification block composed of 3 blocks layers: a background subtraction layer that enables the model to focus on the subject of interest followed by Deformable Convolution layers for feature extraction and finally an additive Softmax for the final classification. Learned information from the first stage is then transferred to the second stage composed of Long Short-Term Memory layers and a final loss function for prediction. The pervasive evaluation on the UT-Interaction, the HMDB51 as well as on the UCF-Sports benchmarks show the betterment of our model performance over threshold probability difference as compared to other solutions. And demonstrate an early action prediction at a lower observation ratio. |
topic |
Background subtraction deformable convolution sequential recurrent network action classification action prediction |
url |
https://ieeexplore.ieee.org/document/9139509/ |
work_keys_str_mv |
AT keyangcheng actionpredictionbasedonpartialvideoobservationviacontextandtemporalsequentialnetworkwithdeformableconvolution AT erickasangulubamba actionpredictionbasedonpartialvideoobservationviacontextandtemporalsequentialnetworkwithdeformableconvolution AT qingliu actionpredictionbasedonpartialvideoobservationviacontextandtemporalsequentialnetworkwithdeformableconvolution |
_version_ |
1724183522984329216 |