Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information

Three-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods p...

Full description

Bibliographic Details
Main Authors: Wei-Ta Chu, Zong-Wei Pan
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9298758/
id doaj-fd3b3c8850d2428d802919294da6fc83
record_format Article
spelling doaj-fd3b3c8850d2428d802919294da6fc832021-03-30T04:20:48ZengIEEEIEEE Access2169-35362020-01-01822697422698110.1109/ACCESS.2020.30457949298758Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview InformationWei-Ta Chu0https://orcid.org/0000-0001-5722-7239Zong-Wei Pan1Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, TaiwanDepartment of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, TaiwanThree-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods propose to independently consider information from consecutive video frames, or frames simultaneously captured from multiple viewpoints. In this article, we propose to jointly consider temporal information and multiview information in a unified adversarial learning framework. Given a 2D skeleton, a pose generator network is developed to estimate the corresponding 3D skeleton, and a camera network is developed to estimate camera parameters. The estimated 3D skeleton is evaluated by a critic network to examine whether the estimated one is a plausible 3D human pose or not. Based on the estimated camera parameters, the estimated 3D skeleton can be re-projected into a 2D skeleton, which should be similar to the input 2D skeleton. The ideas of re-projection and adversarial learning enable the scheme of self supervision. We design network architectures of the aforementioned networks to take 2D skeletons from multiple viewpoints in temporally consecutive frames. By jointly considering two types of information, we verify that performance can be largely improved.https://ieeexplore.ieee.org/document/9298758/3D human pose estimationsemi-supervisedtemporal informationmultiview information
collection DOAJ
language English
format Article
sources DOAJ
author Wei-Ta Chu
Zong-Wei Pan
spellingShingle Wei-Ta Chu
Zong-Wei Pan
Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
IEEE Access
3D human pose estimation
semi-supervised
temporal information
multiview information
author_facet Wei-Ta Chu
Zong-Wei Pan
author_sort Wei-Ta Chu
title Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_short Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_full Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_fullStr Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_full_unstemmed Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_sort semi-supervised 3d human pose estimation by jointly considering temporal and multiview information
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Three-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods propose to independently consider information from consecutive video frames, or frames simultaneously captured from multiple viewpoints. In this article, we propose to jointly consider temporal information and multiview information in a unified adversarial learning framework. Given a 2D skeleton, a pose generator network is developed to estimate the corresponding 3D skeleton, and a camera network is developed to estimate camera parameters. The estimated 3D skeleton is evaluated by a critic network to examine whether the estimated one is a plausible 3D human pose or not. Based on the estimated camera parameters, the estimated 3D skeleton can be re-projected into a 2D skeleton, which should be similar to the input 2D skeleton. The ideas of re-projection and adversarial learning enable the scheme of self supervision. We design network architectures of the aforementioned networks to take 2D skeletons from multiple viewpoints in temporally consecutive frames. By jointly considering two types of information, we verify that performance can be largely improved.
topic 3D human pose estimation
semi-supervised
temporal information
multiview information
url https://ieeexplore.ieee.org/document/9298758/
work_keys_str_mv AT weitachu semisupervised3dhumanposeestimationbyjointlyconsideringtemporalandmultiviewinformation
AT zongweipan semisupervised3dhumanposeestimationbyjointlyconsideringtemporalandmultiviewinformation
_version_ 1724182043393261568