Viewing Bias Matters in 360° Videos Visual Saliency Prediction

360° video has been applied to many areas such as immersive contents, virtual tours, and surveillance systems. Compared to the field of view prediction on planar videos, the explosive amount of information contained in the omni-directional view on the entire sphere poses an additional cha...

Full description

Bibliographic Details
Main Authors: Chao, Y. (Author), Chen, P. (Author), Huang, C. (Author), Huang, G. (Author), Lu, C. (Author), Wu, P. (Author), Yang, T. (Author)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2023
Subjects:
Online Access:View Fulltext in Publisher
View in Scopus
LEADER 02492nam a2200373Ia 4500
001 10.1109-ACCESS.2023.3269564
008 230529s2023 CNT 000 0 und d
020 |a 21693536 (ISSN) 
245 1 0 |a Viewing Bias Matters in 360° Videos Visual Saliency Prediction 
260 0 |b Institute of Electrical and Electronics Engineers Inc.  |c 2023 
300 |a 1 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1109/ACCESS.2023.3269564 
856 |z View in Scopus  |u https://www.scopus.com/inward/record.uri?eid=2-s2.0-85159714673&doi=10.1109%2fACCESS.2023.3269564&partnerID=40&md5=8039b07eb2c35897e9b191fe43628b5c 
520 3 |a 360° video has been applied to many areas such as immersive contents, virtual tours, and surveillance systems. Compared to the field of view prediction on planar videos, the explosive amount of information contained in the omni-directional view on the entire sphere poses an additional challenge in predicting high-salient regions in 360° videos. In this work, we propose a visual saliency prediction model that directly takes 360° video in the equirectangular format. Unlike previous works that often adopted recurrent neural network (RNN) architecture for the saliency detection task, in this work, we utilize 3D convolution to a spatial-temporal encoder and generalize SphereNet kernels to construct a spatial-temporal decoder. We further study the statistical properties of viewing biases present in 360° datasets across various video types, which provides us with insights into the design of a fusing mechanism that incorporates the predicted saliency map with the viewing bias in an adaptive manner. The proposed model yields state-of-the-art performance, as evidenced by empirical results over renowned 360° visual saliency datasets such as Salient360!, PVS, and Sport360. Author 
650 0 4 |a 360° videos 
650 0 4 |a Convolutional neural networks 
650 0 4 |a Decoding 
650 0 4 |a deep learning 
650 0 4 |a Deep learning 
650 0 4 |a Feature extraction 
650 0 4 |a Predictive models 
650 0 4 |a Three-dimensional displays 
650 0 4 |a Videos 
650 0 4 |a viewing bias 
650 0 4 |a Visual saliency prediction 
650 0 4 |a Visualization 
700 1 0 |a Chao, Y.  |e author 
700 1 0 |a Chen, P.  |e author 
700 1 0 |a Huang, C.  |e author 
700 1 0 |a Huang, G.  |e author 
700 1 0 |a Lu, C.  |e author 
700 1 0 |a Wu, P.  |e author 
700 1 0 |a Yang, T.  |e author 
773 |t IEEE Access