Deep Learning Based Depth Estimation and Analysis for 360° Stereo Cameras

碩士 === 國立交通大學 === 電子研究所 === 107 === The 360 degree virtual view synthesis plays an important role in Virtual Reality and the depth map is the key information to reconstruct the 3D world. In this study, we use two spherical cameras to form a 360° stereo system, which can capture all the surrounding s...

Full description

Bibliographic Details
Main Authors: Hsieh, Meng-Hsun, 謝孟勳
Other Authors: Hang, Hsueh-Ming
Format: Others
Language:en_US
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/y7mjfw
Description
Summary:碩士 === 國立交通大學 === 電子研究所 === 107 === The 360 degree virtual view synthesis plays an important role in Virtual Reality and the depth map is the key information to reconstruct the 3D world. In this study, we use two spherical cameras to form a 360° stereo system, which can capture all the surrounding scene in two views. We then use these two spherical images to estimate the spherical depth map. We developed a depth estimation procedure on the spherical stereo images using an existing neural network, PSMNet. To train the network for spherical disparity estimation, we built a panorama stereo image dataset based on the SYNTHIA dataset, which has disparity ground truth. More importantly, we investigated the limits of spherical image depth estimation. Different from the disparity definition on the perspective view stereo, the spherical disparity is measured as the angle difference of the same object point on two views. Thus, the object aligned with the baseline has zero spherical disparity. Due to image plane pixel resolution, the maximum sensing distance for spherical disparity estimation was derived. Also, we studied the occlusion problem of a surface in spherical stereo, and derived the minimum reliable sensing distance. Both distance limits are functions of baseline. These properties help us in choosing an appropriate baseline length for constructing a spherical stereo. In our experiments, we performed depth estimation on both synthetic images and real scene images, and evaluated the performance on synthetic images with the ground truth depth. In the SYNTHIA test set, we can achieve an error rate of 2.18% using the KITTI benchmark D1 error criterion, which is lower than the original PSMNet tested on the KITTI dataset. At the end, we generated the synthetic views using Facebook 3D photo tools and our estimated depth maps. The good subjective quality of the synthesized images indicates that our estimated depth map is rather accurate.