A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC
We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2017-01-01
|
Series: | Scientific Programming |
Online Access: | http://dx.doi.org/10.1155/2017/1431574 |
id |
doaj-45daba5e9f6640a6ab63ed775f7bc59b |
---|---|
record_format |
Article |
spelling |
doaj-45daba5e9f6640a6ab63ed775f7bc59b2021-07-02T02:04:35ZengHindawi LimitedScientific Programming1058-92441875-919X2017-01-01201710.1155/2017/14315741431574A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVCYun-gang Xue0Hua-you Su1Ju Ren2Mei Wen3Chun-yuan Zhang4Li-quan Xiao5School of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaSchool of Computer, National University of Defense Technology, Changsha 410073, ChinaWe propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30–60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560×1600 video format, while, for 832×480, the performance is more than 100 fps.http://dx.doi.org/10.1155/2017/1431574 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yun-gang Xue Hua-you Su Ju Ren Mei Wen Chun-yuan Zhang Li-quan Xiao |
spellingShingle |
Yun-gang Xue Hua-you Su Ju Ren Mei Wen Chun-yuan Zhang Li-quan Xiao A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC Scientific Programming |
author_facet |
Yun-gang Xue Hua-you Su Ju Ren Mei Wen Chun-yuan Zhang Li-quan Xiao |
author_sort |
Yun-gang Xue |
title |
A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC |
title_short |
A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC |
title_full |
A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC |
title_fullStr |
A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC |
title_full_unstemmed |
A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC |
title_sort |
highly parallel and scalable motion estimation algorithm with gpu for hevc |
publisher |
Hindawi Limited |
series |
Scientific Programming |
issn |
1058-9244 1875-919X |
publishDate |
2017-01-01 |
description |
We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30–60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560×1600 video format, while, for 832×480, the performance is more than 100 fps. |
url |
http://dx.doi.org/10.1155/2017/1431574 |
work_keys_str_mv |
AT yungangxue ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT huayousu ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT juren ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT meiwen ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT chunyuanzhang ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT liquanxiao ahighlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT yungangxue highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT huayousu highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT juren highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT meiwen highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT chunyuanzhang highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc AT liquanxiao highlyparallelandscalablemotionestimationalgorithmwithgpuforhevc |
_version_ |
1721343814332841984 |