A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC

We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-...

Full description

Bibliographic Details
Main Authors: Yun-gang Xue, Hua-you Su, Ju Ren, Mei Wen, Chun-yuan Zhang, Li-quan Xiao
Format: Article
Language:English
Published: Hindawi Limited 2017-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.1155/2017/1431574
id doaj-45daba5e9f6640a6ab63ed775f7bc59b
record_format Article
spelling doaj-45daba5e9f6640a6ab63ed775f7bc59b2021-07-02T02:04:35ZengHindawi LimitedScientific Programming1058-92441875-919X2017-01-01201710.1155/2017/14315741431574A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVCYun-gang Xue0Hua-you Su1Ju Ren2Mei Wen3Chun-yuan Zhang4Li-quan Xiao5School of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaWe propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30–60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560×1600 video format, while, for 832×480, the performance is more than 100 fps.http://dx.doi.org/10.1155/2017/1431574
collection DOAJ
language English
format Article
sources DOAJ
author Yun-gang Xue
Hua-you Su
Ju Ren
Mei Wen
Chun-yuan Zhang
Li-quan Xiao
spellingShingle Yun-gang Xue
Hua-you Su
Ju Ren
Mei Wen
Chun-yuan Zhang
Li-quan Xiao
A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC
Scientific Programming
author_facet Yun-gang Xue
Hua-you Su
Ju Ren
Mei Wen
Chun-yuan Zhang
Li-quan Xiao
author_sort Yun-gang Xue
title A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC
title_short A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC
title_full A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC
title_fullStr A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC
title_full_unstemmed A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC
title_sort highly parallel and scalable motion estimation algorithm with gpu for hevc
publisher Hindawi Limited
series Scientific Programming
issn 1058-9244
1875-919X
publishDate 2017-01-01
description We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30–60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560×1600 video format, while, for 832×480, the performance is more than 100 fps.
url http://dx.doi.org/10.1155/2017/1431574
work_keys_str_mv AT yungangxue ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT huayousu ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT juren ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT meiwen ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT chunyuanzhang ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT liquanxiao ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT yungangxue highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT huayousu highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT juren highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT meiwen highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT chunyuanzhang highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
AT liquanxiao highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc
_version_ 1721343814332841984