Supervised Video-to-Video Synthesis for Single Human Pose Transfer

In this paper, we focus on human pose transfer in different videos, i.e., transferring the dance pose of a person in given video to a target person in the other video. Our methods can be summed up in three stages to tackle this challenging scenario. Firstly, we extract the frames and pose masks from...

Full description

Bibliographic Details
Main Authors: Hongyu Wang, Mengxing Huang, Di Wu, Yuchun Li, Weichao Zhang
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9333577/
id doaj-d13e6a8e35ce48c4bec04e88bed52437
record_format Article
spelling doaj-d13e6a8e35ce48c4bec04e88bed524372021-03-30T15:24:52ZengIEEEIEEE Access2169-35362021-01-019175441755610.1109/ACCESS.2021.30536179333577Supervised Video-to-Video Synthesis for Single Human Pose TransferHongyu Wang0https://orcid.org/0000-0003-0224-3156Mengxing Huang1https://orcid.org/0000-0002-5709-703XDi Wu2Yuchun Li3https://orcid.org/0000-0003-2723-220XWeichao Zhang4State Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, ChinaState Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, ChinaState Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, ChinaState Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, ChinaState Key Laboratory of Marine Resource Utilization in South China Sea, Hainan University, Haikou, ChinaIn this paper, we focus on human pose transfer in different videos, i.e., transferring the dance pose of a person in given video to a target person in the other video. Our methods can be summed up in three stages to tackle this challenging scenario. Firstly, we extract the frames and pose masks from the source video and target video. Secondly, we use our model to synthesize the frames of target person with the given dance pose. Thirdly, we refine the generated frames to improve the quality of outputs. Our model is built on three stages: 1) human pose extraction and normalization. 2) a GAN based on cross-domain correspondence mechanism to synthesize dance-guided person image in target video by consecutive frames and pose stick images. 3) coarse-to-fine generation strategy which includes two GANs: a GAN used to reconstruct human face in target video, the other generates smoothing frame sequences. Finally, we compress the sequential frames generated from our model into video format. Compared with previous works, our model manifests better person appearance consistency and time coherence in video-to-video synthesis for human motion transfer, which makes the generated video look more realistic. The qualitative and quantitative comparisons represent our approach performs significant improvements over the state-of-the-art methods. Experiments on synthetic frames and ground truth validate the effectiveness of the proposed method.https://ieeexplore.ieee.org/document/9333577/Generative adversarial network (GAN)image-to-image translationvideo-to-video synthesispose-guided person image generation
collection DOAJ
language English
format Article
sources DOAJ
author Hongyu Wang
Mengxing Huang
Di Wu
Yuchun Li
Weichao Zhang
spellingShingle Hongyu Wang
Mengxing Huang
Di Wu
Yuchun Li
Weichao Zhang
Supervised Video-to-Video Synthesis for Single Human Pose Transfer
IEEE Access
Generative adversarial network (GAN)
image-to-image translation
video-to-video synthesis
pose-guided person image generation
author_facet Hongyu Wang
Mengxing Huang
Di Wu
Yuchun Li
Weichao Zhang
author_sort Hongyu Wang
title Supervised Video-to-Video Synthesis for Single Human Pose Transfer
title_short Supervised Video-to-Video Synthesis for Single Human Pose Transfer
title_full Supervised Video-to-Video Synthesis for Single Human Pose Transfer
title_fullStr Supervised Video-to-Video Synthesis for Single Human Pose Transfer
title_full_unstemmed Supervised Video-to-Video Synthesis for Single Human Pose Transfer
title_sort supervised video-to-video synthesis for single human pose transfer
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description In this paper, we focus on human pose transfer in different videos, i.e., transferring the dance pose of a person in given video to a target person in the other video. Our methods can be summed up in three stages to tackle this challenging scenario. Firstly, we extract the frames and pose masks from the source video and target video. Secondly, we use our model to synthesize the frames of target person with the given dance pose. Thirdly, we refine the generated frames to improve the quality of outputs. Our model is built on three stages: 1) human pose extraction and normalization. 2) a GAN based on cross-domain correspondence mechanism to synthesize dance-guided person image in target video by consecutive frames and pose stick images. 3) coarse-to-fine generation strategy which includes two GANs: a GAN used to reconstruct human face in target video, the other generates smoothing frame sequences. Finally, we compress the sequential frames generated from our model into video format. Compared with previous works, our model manifests better person appearance consistency and time coherence in video-to-video synthesis for human motion transfer, which makes the generated video look more realistic. The qualitative and quantitative comparisons represent our approach performs significant improvements over the state-of-the-art methods. Experiments on synthetic frames and ground truth validate the effectiveness of the proposed method.
topic Generative adversarial network (GAN)
image-to-image translation
video-to-video synthesis
pose-guided person image generation
url https://ieeexplore.ieee.org/document/9333577/
work_keys_str_mv AT hongyuwang supervisedvideotovideosynthesisforsinglehumanposetransfer
AT mengxinghuang supervisedvideotovideosynthesisforsinglehumanposetransfer
AT diwu supervisedvideotovideosynthesisforsinglehumanposetransfer
AT yuchunli supervisedvideotovideosynthesisforsinglehumanposetransfer
AT weichaozhang supervisedvideotovideosynthesisforsinglehumanposetransfer
_version_ 1724179580267266048