Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR

To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions,...

Full description

Bibliographic Details
Main Authors:	Jeonghoon Kwak, Yunsick Sung
Format:	Article
Language:	English
Published:	MDPI AG 2020-04-01
Series:	Remote Sensing
Subjects:	feature extraction deep learning 3D landmark 3D point cloud motion analysis user interface
Online Access:	https://www.mdpi.com/2072-4292/12/7/1142

id	doaj-1ef2829fb8b845ec9efb4b71af06928a
record_format	Article
spelling	doaj-1ef2829fb8b845ec9efb4b71af06928a2020-11-25T02:27:11ZengMDPI AGRemote Sensing2072-42922020-04-01121142114210.3390/rs12071142Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDARJeonghoon Kwak0Yunsick Sung1Department of Multimedia Engineering, Dongguk University-Seoul, 30 Pildong-ro 1-gil, Jung-gu, Seoul 04620, KoreaDepartment of Multimedia Engineering, Dongguk University-Seoul, 30 Pildong-ro 1-gil, Jung-gu, Seoul 04620, KoreaTo provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user’s entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user’s body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user’s motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user’s motions with RGBD images. In this manner, landmarks could be extracted according to the user’s motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.https://www.mdpi.com/2072-4292/12/7/1142feature extractiondeep learning3D landmark3D point cloudmotion analysisuser interface
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jeonghoon Kwak Yunsick Sung
spellingShingle	Jeonghoon Kwak Yunsick Sung Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR Remote Sensing feature extraction deep learning 3D landmark 3D point cloud motion analysis user interface
author_facet	Jeonghoon Kwak Yunsick Sung
author_sort	Jeonghoon Kwak
title	Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR
title_short	Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR
title_full	Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR
title_fullStr	Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR
title_full_unstemmed	Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR
title_sort	automatic 3d landmark extraction system based on an encoder–decoder using fusion of vision and lidar
publisher	MDPI AG
series	Remote Sensing
issn	2072-4292
publishDate	2020-04-01
description	To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user’s entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user’s body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user’s motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user’s motions with RGBD images. In this manner, landmarks could be extracted according to the user’s motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.
topic	feature extraction deep learning 3D landmark 3D point cloud motion analysis user interface
url	https://www.mdpi.com/2072-4292/12/7/1142
work_keys_str_mv	AT jeonghoonkwak automatic3dlandmarkextractionsystembasedonanencoderdecoderusingfusionofvisionandlidar AT yunsicksung automatic3dlandmarkextractionsystembasedonanencoderdecoderusingfusionofvisionandlidar
_version_	1724843755348951040

Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR

Similar Items