HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN

Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose...

Full description

Bibliographic Details
Main Authors: Paritosh Parmar, Brendan Morris
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Signals
Subjects:
Online Access:https://www.mdpi.com/2624-6120/2/3/37
id doaj-436fda6cd3c148609f228626c952a256
record_format Article
spelling doaj-436fda6cd3c148609f228626c952a2562021-09-26T01:25:18ZengMDPI AGSignals2624-61202021-09-0123760461810.3390/signals2030037HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNNParitosh Parmar0Brendan Morris1Department of Computer Science, University of British Columbia, Vancouver, BC V6T 1Z4, CanadaDepartment of Electrical & Computer Engineering, University of Nevada, Las Vegas, NV 89119, USASpatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action-related task in a multitask learning setting. Thorough experimental evaluation, it is shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. We also observed that our hallucination task has utility not only during the training phase, but also during the pre-training phase.https://www.mdpi.com/2624-6120/2/3/37action recognitionscene recognitionaction quality assessmentactivity recognitiondeep learningcomputer vision
collection DOAJ
language English
format Article
sources DOAJ
author Paritosh Parmar
Brendan Morris
spellingShingle Paritosh Parmar
Brendan Morris
HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN
Signals
action recognition
scene recognition
action quality assessment
activity recognition
deep learning
computer vision
author_facet Paritosh Parmar
Brendan Morris
author_sort Paritosh Parmar
title HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN
title_short HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN
title_full HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN
title_fullStr HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN
title_full_unstemmed HalluciNet-<i>ing</i> Spatiotemporal Representations Using a 2D-CNN
title_sort hallucinet-<i>ing</i> spatiotemporal representations using a 2d-cnn
publisher MDPI AG
series Signals
issn 2624-6120
publishDate 2021-09-01
description Spatiotemporal representations learned using 3D convolutional neural networks (CNN) are currently used in state-of-the-art approaches for action-related tasks. However, 3D-CNN are notorious for being memory and compute resource intensive as compared with more simple 2D-CNN architectures. We propose to hallucinate spatiotemporal representations from a 3D-CNN teacher with a 2D-CNN student. By requiring the 2D-CNN to predict the future and intuit upcoming activity, it is encouraged to gain a deeper understanding of actions and how they evolve. The hallucination task is treated as an auxiliary task, which can be used with any other action-related task in a multitask learning setting. Thorough experimental evaluation, it is shown that the hallucination task indeed helps improve performance on action recognition, action quality assessment, and dynamic scene recognition tasks. From a practical standpoint, being able to hallucinate spatiotemporal representations without an actual 3D-CNN can enable deployment in resource-constrained scenarios, such as with limited computing power and/or lower bandwidth. We also observed that our hallucination task has utility not only during the training phase, but also during the pre-training phase.
topic action recognition
scene recognition
action quality assessment
activity recognition
deep learning
computer vision
url https://www.mdpi.com/2624-6120/2/3/37
work_keys_str_mv AT paritoshparmar hallucinetiingispatiotemporalrepresentationsusinga2dcnn
AT brendanmorris hallucinetiingispatiotemporalrepresentationsusinga2dcnn
_version_ 1716868987092992000