Summary: | 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 107 === The deep learning technology has brought great success in image classification, object detection and semantic segmentation tasks. Recent years, the advent of inexpensive depth sensors hugely motivate 3D research area and real scene reconstruction datasets such as ScanNet [5] and Matterport3D [1] have been proposed. However, the problem of 3D scene semantic segmentation still remains new and challenging due to many variance of 3D data type (e.g. image, voxel, point cloud). Other difficulties such as suffering from high computation cost and the scarcity of data dispel the research progress of 3D segmentation. In this paper, we study 3D indoor scene segmentation problem with three different types of 3D data, which we categorize into image-based, voxel-based and point-based. We experiment on different input signals (e.g. color, depth, normal) and verify their effectiveness and performance in different data type networks. We further study fusion methods and improve the performance by using off-the-shelf deep models and by leveraging data modalities in the paper.
|