Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information

Three-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods p...

Full description

Bibliographic Details
Main Authors:	Wei-Ta Chu, Zong-Wei Pan
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	3D human pose estimation semi-supervised temporal information multiview information
Online Access:	https://ieeexplore.ieee.org/document/9298758/

id	doaj-fd3b3c8850d2428d802919294da6fc83
record_format	Article
spelling	doaj-fd3b3c8850d2428d802919294da6fc832021-03-30T04:20:48ZengIEEEIEEE Access2169-35362020-01-01822697422698110.1109/ACCESS.2020.30457949298758Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview InformationWei-Ta Chu0https://orcid.org/0000-0001-5722-7239Zong-Wei Pan1Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, TaiwanDepartment of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, TaiwanThree-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods propose to independently consider information from consecutive video frames, or frames simultaneously captured from multiple viewpoints. In this article, we propose to jointly consider temporal information and multiview information in a unified adversarial learning framework. Given a 2D skeleton, a pose generator network is developed to estimate the corresponding 3D skeleton, and a camera network is developed to estimate camera parameters. The estimated 3D skeleton is evaluated by a critic network to examine whether the estimated one is a plausible 3D human pose or not. Based on the estimated camera parameters, the estimated 3D skeleton can be re-projected into a 2D skeleton, which should be similar to the input 2D skeleton. The ideas of re-projection and adversarial learning enable the scheme of self supervision. We design network architectures of the aforementioned networks to take 2D skeletons from multiple viewpoints in temporally consecutive frames. By jointly considering two types of information, we verify that performance can be largely improved.https://ieeexplore.ieee.org/document/9298758/3D human pose estimationsemi-supervisedtemporal informationmultiview information
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Wei-Ta Chu Zong-Wei Pan
spellingShingle	Wei-Ta Chu Zong-Wei Pan Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information IEEE Access 3D human pose estimation semi-supervised temporal information multiview information
author_facet	Wei-Ta Chu Zong-Wei Pan
author_sort	Wei-Ta Chu
title	Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_short	Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_full	Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_fullStr	Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_full_unstemmed	Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
title_sort	semi-supervised 3d human pose estimation by jointly considering temporal and multiview information
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Three-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods propose to independently consider information from consecutive video frames, or frames simultaneously captured from multiple viewpoints. In this article, we propose to jointly consider temporal information and multiview information in a unified adversarial learning framework. Given a 2D skeleton, a pose generator network is developed to estimate the corresponding 3D skeleton, and a camera network is developed to estimate camera parameters. The estimated 3D skeleton is evaluated by a critic network to examine whether the estimated one is a plausible 3D human pose or not. Based on the estimated camera parameters, the estimated 3D skeleton can be re-projected into a 2D skeleton, which should be similar to the input 2D skeleton. The ideas of re-projection and adversarial learning enable the scheme of self supervision. We design network architectures of the aforementioned networks to take 2D skeletons from multiple viewpoints in temporally consecutive frames. By jointly considering two types of information, we verify that performance can be largely improved.
topic	3D human pose estimation semi-supervised temporal information multiview information
url	https://ieeexplore.ieee.org/document/9298758/
work_keys_str_mv	AT weitachu semisupervised3dhumanposeestimationbyjointlyconsideringtemporalandmultiviewinformation AT zongweipan semisupervised3dhumanposeestimationbyjointlyconsideringtemporalandmultiviewinformation
_version_	1724182043393261568

Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information

Similar Items