Complete Video-Level Representations for Action Recognition
In most of the existing work for activity recognition, 3D ConvNets show promising performance for learning spatiotemporal features of videos. However, most methods sample fixed-length frames from the original video, which are cropped to a fixed size and fed into the model for training. In this manne...
Main Authors: | Min Li, Ruwen Bai, Bo Meng, Junxing Ren, Miao Jiang, Yang Yang, Linghan Li, Hong Du |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9353486/ |
Similar Items
-
Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks
by: Yanqiu Liao, et al.
Published: (2019-01-01) -
Spatially and Temporally Structured Global to Local Aggregation of Dynamic Depth Information for Action Recognition
by: Yonghong Hou, et al.
Published: (2018-01-01) -
A Novel Violent Video Detection Scheme Based on Modified 3D Convolutional Neural Networks
by: Wei Song, et al.
Published: (2019-01-01) -
ALBERTC-CNN Based Aspect Level Sentiment Analysis
by: Xingxin Ye, et al.
Published: (2021-01-01) -
Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation
by: Le Wang, et al.
Published: (2018-05-01)