Complete Video-Level Representations for Action Recognition

In most of the existing work for activity recognition, 3D ConvNets show promising performance for learning spatiotemporal features of videos. However, most methods sample fixed-length frames from the original video, which are cropped to a fixed size and fed into the model for training. In this manne...

Full description

Bibliographic Details
Main Authors: Min Li, Ruwen Bai, Bo Meng, Junxing Ren, Miao Jiang, Yang Yang, Linghan Li, Hong Du
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9353486/