Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments

Simultaneous localization and mapping(SLAM), focusing on addressing the joint estimation problem of self-localization and scene mapping, has been widely used in many applications such as mobile robot, drone, and augmented reality(AR). However, traditional state-of-the-art SLAM approaches are typical...

Full description

Bibliographic Details
Main Authors: Xinyang Zhao, Changhong Wang, Marcelo H. Ang
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9173806/
id doaj-1f9428b79c594821959ba60669860e01
record_format Article
spelling doaj-1f9428b79c594821959ba60669860e012021-03-30T03:54:05ZengIEEEIEEE Access2169-35362020-01-01815504715505910.1109/ACCESS.2020.30185579173806Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic EnvironmentsXinyang Zhao0https://orcid.org/0000-0003-2688-9357Changhong Wang1https://orcid.org/0000-0002-6077-162XMarcelo H. Ang2https://orcid.org/0000-0001-8277-6408School of Astronautics, Harbin Institute of Technology, Harbin, ChinaSchool of Astronautics, Harbin Institute of Technology, Harbin, ChinaDepartment of Mechanical Engineering, Advanced Robotics Centre, National University of Singapore, SingaporeSimultaneous localization and mapping(SLAM), focusing on addressing the joint estimation problem of self-localization and scene mapping, has been widely used in many applications such as mobile robot, drone, and augmented reality(AR). However, traditional state-of-the-art SLAM approaches are typically designed under the static-world assumption and prone to be degraded by moving objects when running in dynamic scenes. This article presents a novel semantic visual-inertial SLAM system for dynamic environments that, building on VINS-Mono, performs real-time trajectory estimation by utilizing the pixel-wise results of semantic segmentation. We integrate the feature tracking and extraction framework into the front-end of the SLAM system, which could make full use of the time waiting for the completion of the semantic segmentation module, to effectively track the feature points on subsequent images from the camera. In this way, the system can track feature points stably even in high-speed movement. We also construct the dynamic feature detection module that combines the pixel-wise semantic segmentation results and the multi-view geometric constraints to exclude dynamic feature points. We evaluate our system in public datasets, including dynamic indoor scenes and outdoor scenes. Several experiments demonstrate that our system could achieve higher localization accuracy and robustness than state-of-the-art SLAM systems in challenging environments.https://ieeexplore.ieee.org/document/9173806/Simultaneous localization and mappingdynamic environmentsemanticvisual-inertial system
collection DOAJ
language English
format Article
sources DOAJ
author Xinyang Zhao
Changhong Wang
Marcelo H. Ang
spellingShingle Xinyang Zhao
Changhong Wang
Marcelo H. Ang
Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments
IEEE Access
Simultaneous localization and mapping
dynamic environment
semantic
visual-inertial system
author_facet Xinyang Zhao
Changhong Wang
Marcelo H. Ang
author_sort Xinyang Zhao
title Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments
title_short Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments
title_full Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments
title_fullStr Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments
title_full_unstemmed Real-Time Visual-Inertial Localization Using Semantic Segmentation Towards Dynamic Environments
title_sort real-time visual-inertial localization using semantic segmentation towards dynamic environments
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description Simultaneous localization and mapping(SLAM), focusing on addressing the joint estimation problem of self-localization and scene mapping, has been widely used in many applications such as mobile robot, drone, and augmented reality(AR). However, traditional state-of-the-art SLAM approaches are typically designed under the static-world assumption and prone to be degraded by moving objects when running in dynamic scenes. This article presents a novel semantic visual-inertial SLAM system for dynamic environments that, building on VINS-Mono, performs real-time trajectory estimation by utilizing the pixel-wise results of semantic segmentation. We integrate the feature tracking and extraction framework into the front-end of the SLAM system, which could make full use of the time waiting for the completion of the semantic segmentation module, to effectively track the feature points on subsequent images from the camera. In this way, the system can track feature points stably even in high-speed movement. We also construct the dynamic feature detection module that combines the pixel-wise semantic segmentation results and the multi-view geometric constraints to exclude dynamic feature points. We evaluate our system in public datasets, including dynamic indoor scenes and outdoor scenes. Several experiments demonstrate that our system could achieve higher localization accuracy and robustness than state-of-the-art SLAM systems in challenging environments.
topic Simultaneous localization and mapping
dynamic environment
semantic
visual-inertial system
url https://ieeexplore.ieee.org/document/9173806/
work_keys_str_mv AT xinyangzhao realtimevisualinertiallocalizationusingsemanticsegmentationtowardsdynamicenvironments
AT changhongwang realtimevisualinertiallocalizationusingsemanticsegmentationtowardsdynamicenvironments
AT marcelohang realtimevisualinertiallocalizationusingsemanticsegmentationtowardsdynamicenvironments
_version_ 1724182596401758208