Action Recognition From Thermal Videos

Human action recognition using a camera-based surveillance system remains a challenging task. In particular, action recognition is difficult when a human is not visible in an image captured in a dark environment. The existing studies have utilized near-infrared (NIR) and thermal cameras to solve thi...

Full description

Bibliographic Details
Main Authors: Ganbayar Batchuluun, Dat Tien Nguyen, Tuyen Danh Pham, Chanhum Park, Kang Ryoung Park
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8779645/
Description
Summary:Human action recognition using a camera-based surveillance system remains a challenging task. In particular, action recognition is difficult when a human is not visible in an image captured in a dark environment. The existing studies have utilized near-infrared (NIR) and thermal cameras to solve this problem. Compared to NIR cameras, thermal cameras enable long- and short-distance objects to be visible without an additional illuminator. However, thermal cameras have two major disadvantages: a halo effect and a temperature similarity. A halo effect occurs around an object with a high temperature. In a human object, such a halo effect is similar to a shadow under the body area. It is more difficult to segment a human area from an image with a halo effect. Moreover, if the background and human object have similar temperatures, it becomes more difficult to segment the human area. These disadvantages influence not only the accuracy of the segmentation of the human area but also the performance of human action recognition. Unfortunately, no studies have considered these issues. To address these problems, this study proposes the cycle-consistent generative adversarial network (CycleGAN)-based methods for removing halo effects from thermal images and restoring the areas of the human bodies. In addition, this study also considered a method for creating a skeleton image from a thermal image to analyze body movements. To extract more spatial and temporal features from skeleton image sequences thus created, a method for human action recognition that combines a convolutional neural network (CNN) and long short-term memory (LSTM) was proposed. In an experiment using an open database (Dongguk activities & actions database (DA&A-DB2)), the proposed method demonstrated a better performance than the existing methods.
ISSN:2169-3536