Per-title and per-segment CRF estimation using DNNs for quality-based video coding

Nowadays, video content accounts for a large percentage of network traffic. Most streaming services use HTTP Adaptive Streaming by splitting the video in non-overlapping segments and by encoding each video segment independently (possibly with multiple representations to allow adaptation to the varyi...

Full description

Bibliographic Details
Main Authors:	Garcia-Pineda, M. (Author), Gutiérrez-Aguado, J. (Author), Micó-Enguídanos, F. (Author), Moina-Rivera, W. (Author)
Format:	Article
Language:	English
Published:	Elsevier Ltd 2023
Subjects:	Adaptive streaming Constant rate Deep neural network Deep neural networks Encoded videos Encoding (symbols) Fusion quality HTTP HTTP adaptive streaming HTTP Adaptive Streaming Image segmentation Method assessment Multi methods Network coding Video coding Video contents Video quality Video segments Video streaming
Online Access:	View Fulltext in Publisher View in Scopus


LEADER	03185nam a2200409Ia 4500
001	10.1016-j.eswa.2023.120289
008	230529s2023 CNT 000 0 und d
020			\|a 09574174 (ISSN)
245	1	0	\|a Per-title and per-segment CRF estimation using DNNs for quality-based video coding
260		0	\|b Elsevier Ltd \|c 2023
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1016/j.eswa.2023.120289
856			\|z View in Scopus \|u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85158865732&doi=10.1016%2fj.eswa.2023.120289&partnerID=40&md5=e898ed05d6cb05c4fd02d921449647f9
520	3		\|a Nowadays, video content accounts for a large percentage of network traffic. Most streaming services use HTTP Adaptive Streaming by splitting the video in non-overlapping segments and by encoding each video segment independently (possibly with multiple representations to allow adaptation to the varying network conditions). In this work we propose an encoding scheme based on a Deep Neural Network (DNN), to perform a per-title and per-segment adaptive estimation of the encoding parameter to achieve a target video quality of the encoded video. A dataset has been prepared using 1212 segments obtained from 158 videos. The segments have been encoded with 19 Constant Rate Factor (CRF) values using the VP9 encoder, generating a total of 23028 encoded segments, and for each encoded video segment, its Video Multi-Method Assessment Fusion (VMAF) quality has been computed. Besides, from a 240p downscaled version of the segments a feature vector has been obtained, and an analysis of the dependency of the features with the resolution has been carried out. With this dataset a DNN has been trained to estimate the CRF to be applied to each segment to achieve a target VMAF quality. Results show that the trained network is able to provide the CRF value to be applied to each video segment to achieve the desired quality with low computational overhead. To validate the proposal, the network has been used to predict the CRF to encode 1840 two-second segments, not used during the training process, using the VP9 codec at Full High Definition (FHD) resolution, and four target quality values. Results show that the system adapts the CRF to each segment and that the final videos have a mean deviation of 1.84% with respect to the requested VMAF value. © 2023 The Author(s)
650	0	4	\|a Adaptive streaming
650	0	4	\|a Constant rate
650	0	4	\|a Deep neural network
650	0	4	\|a Deep neural networks
650	0	4	\|a Encoded videos
650	0	4	\|a Encoding (symbols)
650	0	4	\|a Fusion quality
650	0	4	\|a HTTP
650	0	4	\|a HTTP adaptive streaming
650	0	4	\|a HTTP Adaptive Streaming
650	0	4	\|a Image segmentation
650	0	4	\|a Method assessment
650	0	4	\|a Multi methods
650	0	4	\|a Network coding
650	0	4	\|a Video coding
650	0	4	\|a Video contents
650	0	4	\|a Video quality
650	0	4	\|a Video segments
650	0	4	\|a Video streaming
700	1	0	\|a Garcia-Pineda, M. \|e author
700	1	0	\|a Gutiérrez-Aguado, J. \|e author
700	1	0	\|a Micó-Enguídanos, F. \|e author
700	1	0	\|a Moina-Rivera, W. \|e author
773			\|t Expert Systems with Applications

Per-title and per-segment CRF estimation using DNNs for quality-based video coding

Similar Items