Per-title and per-segment CRF estimation using DNNs for quality-based video coding

Nowadays, video content accounts for a large percentage of network traffic. Most streaming services use HTTP Adaptive Streaming by splitting the video in non-overlapping segments and by encoding each video segment independently (possibly with multiple representations to allow adaptation to the varyi...

Full description

Bibliographic Details
Main Authors: Garcia-Pineda, M. (Author), Gutiérrez-Aguado, J. (Author), Micó-Enguídanos, F. (Author), Moina-Rivera, W. (Author)
Format: Article
Language:English
Published: Elsevier Ltd 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 03185nam a2200409Ia 4500
001 10.1016-j.eswa.2023.120289
008 230529s2023 CNT 000 0 und d
020 |a 09574174 (ISSN) 
245 1 0 |a Per-title and per-segment CRF estimation using DNNs for quality-based video coding 
260 0 |b Elsevier Ltd  |c 2023 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1016/j.eswa.2023.120289 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85158865732&doi=10.1016%2fj.eswa.2023.120289&partnerID=40&md5=e898ed05d6cb05c4fd02d921449647f9 
520 3 |a Nowadays, video content accounts for a large percentage of network traffic. Most streaming services use HTTP Adaptive Streaming by splitting the video in non-overlapping segments and by encoding each video segment independently (possibly with multiple representations to allow adaptation to the varying network conditions). In this work we propose an encoding scheme based on a Deep Neural Network (DNN), to perform a per-title and per-segment adaptive estimation of the encoding parameter to achieve a target video quality of the encoded video. A dataset has been prepared using 1212 segments obtained from 158 videos. The segments have been encoded with 19 Constant Rate Factor (CRF) values using the VP9 encoder, generating a total of 23028 encoded segments, and for each encoded video segment, its Video Multi-Method Assessment Fusion (VMAF) quality has been computed. Besides, from a 240p downscaled version of the segments a feature vector has been obtained, and an analysis of the dependency of the features with the resolution has been carried out. With this dataset a DNN has been trained to estimate the CRF to be applied to each segment to achieve a target VMAF quality. Results show that the trained network is able to provide the CRF value to be applied to each video segment to achieve the desired quality with low computational overhead. To validate the proposal, the network has been used to predict the CRF to encode 1840 two-second segments, not used during the training process, using the VP9 codec at Full High Definition (FHD) resolution, and four target quality values. Results show that the system adapts the CRF to each segment and that the final videos have a mean deviation of 1.84% with respect to the requested VMAF value. © 2023 The Author(s) 
650 0 4 |a Adaptive streaming 
650 0 4 |a Constant rate 
650 0 4 |a Deep neural network 
650 0 4 |a Deep neural networks 
650 0 4 |a Encoded videos 
650 0 4 |a Encoding (symbols) 
650 0 4 |a Fusion quality 
650 0 4 |a HTTP 
650 0 4 |a HTTP adaptive streaming 
650 0 4 |a HTTP Adaptive Streaming 
650 0 4 |a Image segmentation 
650 0 4 |a Method assessment 
650 0 4 |a Multi methods 
650 0 4 |a Network coding 
650 0 4 |a Video coding 
650 0 4 |a Video contents 
650 0 4 |a Video quality 
650 0 4 |a Video segments 
650 0 4 |a Video streaming 
700 1 0 |a Garcia-Pineda, M.  |e author 
700 1 0 |a Gutiérrez-Aguado, J.  |e author 
700 1 0 |a Micó-Enguídanos, F.  |e author 
700 1 0 |a Moina-Rivera, W.  |e author 
773 |t Expert Systems with Applications