Per-title and per-segment CRF estimation using DNNs for quality-based video coding
Nowadays, video content accounts for a large percentage of network traffic. Most streaming services use HTTP Adaptive Streaming by splitting the video in non-overlapping segments and by encoding each video segment independently (possibly with multiple representations to allow adaptation to the varyi...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier Ltd
2023
|
Subjects: | |
Online Access: | View Fulltext in Publisher View in Scopus |
LEADER | 03185nam a2200409Ia 4500 | ||
---|---|---|---|
001 | 10.1016-j.eswa.2023.120289 | ||
008 | 230529s2023 CNT 000 0 und d | ||
020 | |a 09574174 (ISSN) | ||
245 | 1 | 0 | |a Per-title and per-segment CRF estimation using DNNs for quality-based video coding |
260 | 0 | |b Elsevier Ltd |c 2023 | |
856 | |z View Fulltext in Publisher |u https://doi.org/10.1016/j.eswa.2023.120289 | ||
856 | |z View in Scopus |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85158865732&doi=10.1016%2fj.eswa.2023.120289&partnerID=40&md5=e898ed05d6cb05c4fd02d921449647f9 | ||
520 | 3 | |a Nowadays, video content accounts for a large percentage of network traffic. Most streaming services use HTTP Adaptive Streaming by splitting the video in non-overlapping segments and by encoding each video segment independently (possibly with multiple representations to allow adaptation to the varying network conditions). In this work we propose an encoding scheme based on a Deep Neural Network (DNN), to perform a per-title and per-segment adaptive estimation of the encoding parameter to achieve a target video quality of the encoded video. A dataset has been prepared using 1212 segments obtained from 158 videos. The segments have been encoded with 19 Constant Rate Factor (CRF) values using the VP9 encoder, generating a total of 23028 encoded segments, and for each encoded video segment, its Video Multi-Method Assessment Fusion (VMAF) quality has been computed. Besides, from a 240p downscaled version of the segments a feature vector has been obtained, and an analysis of the dependency of the features with the resolution has been carried out. With this dataset a DNN has been trained to estimate the CRF to be applied to each segment to achieve a target VMAF quality. Results show that the trained network is able to provide the CRF value to be applied to each video segment to achieve the desired quality with low computational overhead. To validate the proposal, the network has been used to predict the CRF to encode 1840 two-second segments, not used during the training process, using the VP9 codec at Full High Definition (FHD) resolution, and four target quality values. Results show that the system adapts the CRF to each segment and that the final videos have a mean deviation of 1.84% with respect to the requested VMAF value. © 2023 The Author(s) | |
650 | 0 | 4 | |a Adaptive streaming |
650 | 0 | 4 | |a Constant rate |
650 | 0 | 4 | |a Deep neural network |
650 | 0 | 4 | |a Deep neural networks |
650 | 0 | 4 | |a Encoded videos |
650 | 0 | 4 | |a Encoding (symbols) |
650 | 0 | 4 | |a Fusion quality |
650 | 0 | 4 | |a HTTP |
650 | 0 | 4 | |a HTTP adaptive streaming |
650 | 0 | 4 | |a HTTP Adaptive Streaming |
650 | 0 | 4 | |a Image segmentation |
650 | 0 | 4 | |a Method assessment |
650 | 0 | 4 | |a Multi methods |
650 | 0 | 4 | |a Network coding |
650 | 0 | 4 | |a Video coding |
650 | 0 | 4 | |a Video contents |
650 | 0 | 4 | |a Video quality |
650 | 0 | 4 | |a Video segments |
650 | 0 | 4 | |a Video streaming |
700 | 1 | 0 | |a Garcia-Pineda, M. |e author |
700 | 1 | 0 | |a Gutiérrez-Aguado, J. |e author |
700 | 1 | 0 | |a Micó-Enguídanos, F. |e author |
700 | 1 | 0 | |a Moina-Rivera, W. |e author |
773 | |t Expert Systems with Applications |