Learning Gaze Transitions from Depth to Improve Video Saliency Estimation

© 2017 IEEE. In this paper we introduce a novel Depth-Aware Video Saliency approach to predict human focus of attention when viewing videos that contain a depth map (RGBD) on a 2D screen. Saliency estimation in this scenario is highly important since in the near future 3D video content will be easil...

Full description

Bibliographic Details
Main Authors: Leifman, George (Author), Rudoy, Dmitry (Author), Swedish, Tristan (Author), Bayro-Corrochano, Eduardo (Author), Raskar, Ramesh (Author)
Other Authors: Massachusetts Institute of Technology. Media Laboratory (Contributor)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers (IEEE), 2021-11-09T21:59:21Z.
Subjects:
Online Access:Get fulltext
LEADER 01872 am a22002053u 4500
001 138091
042 |a dc 
100 1 0 |a Leifman, George  |e author 
100 1 0 |a Massachusetts Institute of Technology. Media Laboratory  |e contributor 
700 1 0 |a Rudoy, Dmitry  |e author 
700 1 0 |a Swedish, Tristan  |e author 
700 1 0 |a Bayro-Corrochano, Eduardo  |e author 
700 1 0 |a Raskar, Ramesh  |e author 
245 0 0 |a Learning Gaze Transitions from Depth to Improve Video Saliency Estimation 
260 |b Institute of Electrical and Electronics Engineers (IEEE),   |c 2021-11-09T21:59:21Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/138091 
520 |a © 2017 IEEE. In this paper we introduce a novel Depth-Aware Video Saliency approach to predict human focus of attention when viewing videos that contain a depth map (RGBD) on a 2D screen. Saliency estimation in this scenario is highly important since in the near future 3D video content will be easily acquired yet hard to display. Despite considerable progress in 3D display technologies, most are still expensive and require special glasses for viewing, so RGBD content is primarily viewed on 2D screens, removing the depth channel from the final viewing experience. We train a generative convolutional neural network that predicts the 2D viewing saliency map for a given frame using the RGBD pixel values and previous fixation estimates in the video. To evaluate the performance of our approach, we present a new comprehensive database of 2D viewing eye-fixation ground-truth for RGBD videos. Our experiments indicate that it is beneficial to integrate depth into video saliency estimates for content that is viewed on a 2D display. We demonstrate that our approach outperforms state-of-the-art methods for video saliency, achieving 15% relative improvement. 
546 |a en 
655 7 |a Article 
773 |t 10.1109/iccv.2017.188