Performance engineering for HEVC transform and quantization kernel on GPUs

Continuous growth of video traffic and video services, especially in the field of high resolution and high-quality video content, places heavy demands on video coding and its implementations. High Efficiency Video Coding (HEVC) standard doubles the compression efficiency of its predecessor H.264/AVC...

Full description

Bibliographic Details
Main Authors:	Mate Čobrnić, Alen Duspara, Leon Dragić, Igor Piljić, Mario Kovač
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2020-07-01
Series:	Automatika
Subjects:	integer discrete cosine transform (dct) high efficiency video coding (hevc) graphics processor unit (gpu) matrix multiplication compute unified device architecture (cuda)
Online Access:	http://dx.doi.org/10.1080/00051144.2020.1752046

id	doaj-84561b2dd13a47cf94bb0e64a04aa974
record_format	Article
spelling	doaj-84561b2dd13a47cf94bb0e64a04aa9742020-11-25T02:38:19ZengTaylor & Francis GroupAutomatika0005-11441848-33802020-07-0161332533310.1080/00051144.2020.17520461752046Performance engineering for HEVC transform and quantization kernel on GPUsMate Čobrnić0Alen Duspara1Leon Dragić2Igor Piljić3Mario Kovač4Faculty of Electrical Engineering and Computing, University of ZagrebFaculty of Electrical Engineering and Computing, University of ZagrebFaculty of Electrical Engineering and Computing, University of ZagrebFaculty of Electrical Engineering and Computing, University of ZagrebFaculty of Electrical Engineering and Computing, University of ZagrebContinuous growth of video traffic and video services, especially in the field of high resolution and high-quality video content, places heavy demands on video coding and its implementations. High Efficiency Video Coding (HEVC) standard doubles the compression efficiency of its predecessor H.264/AVC at the cost of high computational complexity. To address those computing issues high-performance video processing takes advantage of heterogeneous multiprocessor platforms. In this paper, we present a highly performance-optimized HEVC transform and quantization kernel with all-zero-block (AZB) identification designed for execution on a Graphics Processor Unit (GPU). Performance optimization strategy involved all three aspects of parallel design, exposing as much of the application’s intrinsic parallelism as possible, exploitation of high throughput memory and efficient instruction usage. It combines efficient mapping of transform blocks to thread-blocks and efficient vectorized access patterns to shared memory for all transform sizes supported in the standard. Two different GPUs of the same architecture were used to evaluate proposed implementation. Achieved processing times are 6.03 and 23.94 ms for DCI 4K and 8K Full Format, respectively. Speedup factors compared to CPU, cuBLAS and AVX2 implementations are up to 80, 19 and 4 times respectively. Proposed implementation outperforms previous work 1.22 times.http://dx.doi.org/10.1080/00051144.2020.1752046integer discrete cosine transform (dct)high efficiency video coding (hevc)graphics processor unit (gpu)matrix multiplicationcompute unified device architecture (cuda)
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Mate Čobrnić Alen Duspara Leon Dragić Igor Piljić Mario Kovač
spellingShingle	Mate Čobrnić Alen Duspara Leon Dragić Igor Piljić Mario Kovač Performance engineering for HEVC transform and quantization kernel on GPUs Automatika integer discrete cosine transform (dct) high efficiency video coding (hevc) graphics processor unit (gpu) matrix multiplication compute unified device architecture (cuda)
author_facet	Mate Čobrnić Alen Duspara Leon Dragić Igor Piljić Mario Kovač
author_sort	Mate Čobrnić
title	Performance engineering for HEVC transform and quantization kernel on GPUs
title_short	Performance engineering for HEVC transform and quantization kernel on GPUs
title_full	Performance engineering for HEVC transform and quantization kernel on GPUs
title_fullStr	Performance engineering for HEVC transform and quantization kernel on GPUs
title_full_unstemmed	Performance engineering for HEVC transform and quantization kernel on GPUs
title_sort	performance engineering for hevc transform and quantization kernel on gpus
publisher	Taylor & Francis Group
series	Automatika
issn	0005-1144 1848-3380
publishDate	2020-07-01
description	Continuous growth of video traffic and video services, especially in the field of high resolution and high-quality video content, places heavy demands on video coding and its implementations. High Efficiency Video Coding (HEVC) standard doubles the compression efficiency of its predecessor H.264/AVC at the cost of high computational complexity. To address those computing issues high-performance video processing takes advantage of heterogeneous multiprocessor platforms. In this paper, we present a highly performance-optimized HEVC transform and quantization kernel with all-zero-block (AZB) identification designed for execution on a Graphics Processor Unit (GPU). Performance optimization strategy involved all three aspects of parallel design, exposing as much of the application’s intrinsic parallelism as possible, exploitation of high throughput memory and efficient instruction usage. It combines efficient mapping of transform blocks to thread-blocks and efficient vectorized access patterns to shared memory for all transform sizes supported in the standard. Two different GPUs of the same architecture were used to evaluate proposed implementation. Achieved processing times are 6.03 and 23.94 ms for DCI 4K and 8K Full Format, respectively. Speedup factors compared to CPU, cuBLAS and AVX2 implementations are up to 80, 19 and 4 times respectively. Proposed implementation outperforms previous work 1.22 times.
topic	integer discrete cosine transform (dct) high efficiency video coding (hevc) graphics processor unit (gpu) matrix multiplication compute unified device architecture (cuda)
url	http://dx.doi.org/10.1080/00051144.2020.1752046
work_keys_str_mv	AT matecobrnic performanceengineeringforhevctransformandquantizationkernelongpus AT alenduspara performanceengineeringforhevctransformandquantizationkernelongpus AT leondragic performanceengineeringforhevctransformandquantizationkernelongpus AT igorpiljic performanceengineeringforhevctransformandquantizationkernelongpus AT mariokovac performanceengineeringforhevctransformandquantizationkernelongpus
_version_	1724791585682489344

Performance engineering for HEVC transform and quantization kernel on GPUs

Similar Items