Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information
Three-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods p...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9298758/ |
id |
doaj-fd3b3c8850d2428d802919294da6fc83 |
---|---|
record_format |
Article |
spelling |
doaj-fd3b3c8850d2428d802919294da6fc832021-03-30T04:20:48ZengIEEEIEEE Access2169-35362020-01-01822697422698110.1109/ACCESS.2020.30457949298758Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview InformationWei-Ta Chu0https://orcid.org/0000-0001-5722-7239Zong-Wei Pan1Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, TaiwanDepartment of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, TaiwanThree-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods propose to independently consider information from consecutive video frames, or frames simultaneously captured from multiple viewpoints. In this article, we propose to jointly consider temporal information and multiview information in a unified adversarial learning framework. Given a 2D skeleton, a pose generator network is developed to estimate the corresponding 3D skeleton, and a camera network is developed to estimate camera parameters. The estimated 3D skeleton is evaluated by a critic network to examine whether the estimated one is a plausible 3D human pose or not. Based on the estimated camera parameters, the estimated 3D skeleton can be re-projected into a 2D skeleton, which should be similar to the input 2D skeleton. The ideas of re-projection and adversarial learning enable the scheme of self supervision. We design network architectures of the aforementioned networks to take 2D skeletons from multiple viewpoints in temporally consecutive frames. By jointly considering two types of information, we verify that performance can be largely improved.https://ieeexplore.ieee.org/document/9298758/3D human pose estimationsemi-supervisedtemporal informationmultiview information |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Wei-Ta Chu Zong-Wei Pan |
spellingShingle |
Wei-Ta Chu Zong-Wei Pan Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information IEEE Access 3D human pose estimation semi-supervised temporal information multiview information |
author_facet |
Wei-Ta Chu Zong-Wei Pan |
author_sort |
Wei-Ta Chu |
title |
Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information |
title_short |
Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information |
title_full |
Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information |
title_fullStr |
Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information |
title_full_unstemmed |
Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information |
title_sort |
semi-supervised 3d human pose estimation by jointly considering temporal and multiview information |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
Three-dimensional human pose estimation is usually conducted in a supervised manner. However, because collecting labeled 3D skeletons is expensive and time-consuming, semi-supervised methods that need much fewer amount of labeled 3D data are urgently demanded. Some semi-supervised learning methods propose to independently consider information from consecutive video frames, or frames simultaneously captured from multiple viewpoints. In this article, we propose to jointly consider temporal information and multiview information in a unified adversarial learning framework. Given a 2D skeleton, a pose generator network is developed to estimate the corresponding 3D skeleton, and a camera network is developed to estimate camera parameters. The estimated 3D skeleton is evaluated by a critic network to examine whether the estimated one is a plausible 3D human pose or not. Based on the estimated camera parameters, the estimated 3D skeleton can be re-projected into a 2D skeleton, which should be similar to the input 2D skeleton. The ideas of re-projection and adversarial learning enable the scheme of self supervision. We design network architectures of the aforementioned networks to take 2D skeletons from multiple viewpoints in temporally consecutive frames. By jointly considering two types of information, we verify that performance can be largely improved. |
topic |
3D human pose estimation semi-supervised temporal information multiview information |
url |
https://ieeexplore.ieee.org/document/9298758/ |
work_keys_str_mv |
AT weitachu semisupervised3dhumanposeestimationbyjointlyconsideringtemporalandmultiviewinformation AT zongweipan semisupervised3dhumanposeestimationbyjointlyconsideringtemporalandmultiviewinformation |
_version_ |
1724182043393261568 |