Cognitive Mapping for Object Searching in Indoor Scenes

abstract: Visual navigation is a multi-disciplinary field across computer vision, machine learning and robotics. It is of great significance in both research and industrial applications. An intelligent agent with visual navigation ability will be capable of performing the following tasks: actively e...

Full description

Bibliographic Details
Other Authors: Zheng, Shibin (Author)
Format: Dissertation
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/2286/R.I.55588
id ndltd-asu.edu-item-55588
record_format oai_dc
spelling ndltd-asu.edu-item-555882020-01-15T03:01:11Z Cognitive Mapping for Object Searching in Indoor Scenes abstract: Visual navigation is a multi-disciplinary field across computer vision, machine learning and robotics. It is of great significance in both research and industrial applications. An intelligent agent with visual navigation ability will be capable of performing the following tasks: actively explore in environments, distinguish and localize a requested target and approach the target following acquired strategies. Despite a variety of advances in mobile robotics, enabling an autonomous with above-mentioned abilities is still a challenging and complex task. However, the solution to the task is very likely to accelerate the landing of assistive robots. Reinforcement learning is a method that trains autonomous robot based on rewarding desired behaviors to help it obtain an action policy that maximizes rewards while the robot interacting with the environment. Through trial and error, an agent learns sophisticated and skillful strategies to handle complex tasks in the environment. Inspired by navigation procedures of human beings that when navigating through environments, humans reason about accessible spaces and geometry of the environment a lot based on first-person view, figure out the destination and then ease over, this work develops a model that maps from pixels to actions and inherently estimate the target as well as the free-space map. The model has three major constituents: (i) a cognitive mapper that maps the topologic free-space map from first-person view images, (ii) a target recognition network that locates a desired object and (iii) an action policy deep reinforcement learning network. Further, a planner model with cascade architecture based on multi-scale semantic top-down occupancy map input is proposed. Dissertation/Thesis Zheng, Shibin (Author) Yang, Yezhou (Advisor) Zhang, Wenlong (Committee member) Ren, Yi (Committee member) Arizona State University (Publisher) Computer engineering eng 61 pages Masters Thesis Computer Engineering 2019 Masters Thesis http://hdl.handle.net/2286/R.I.55588 http://rightsstatements.org/vocab/InC/1.0/ 2019
collection NDLTD
language English
format Dissertation
sources NDLTD
topic Computer engineering
spellingShingle Computer engineering
Cognitive Mapping for Object Searching in Indoor Scenes
description abstract: Visual navigation is a multi-disciplinary field across computer vision, machine learning and robotics. It is of great significance in both research and industrial applications. An intelligent agent with visual navigation ability will be capable of performing the following tasks: actively explore in environments, distinguish and localize a requested target and approach the target following acquired strategies. Despite a variety of advances in mobile robotics, enabling an autonomous with above-mentioned abilities is still a challenging and complex task. However, the solution to the task is very likely to accelerate the landing of assistive robots. Reinforcement learning is a method that trains autonomous robot based on rewarding desired behaviors to help it obtain an action policy that maximizes rewards while the robot interacting with the environment. Through trial and error, an agent learns sophisticated and skillful strategies to handle complex tasks in the environment. Inspired by navigation procedures of human beings that when navigating through environments, humans reason about accessible spaces and geometry of the environment a lot based on first-person view, figure out the destination and then ease over, this work develops a model that maps from pixels to actions and inherently estimate the target as well as the free-space map. The model has three major constituents: (i) a cognitive mapper that maps the topologic free-space map from first-person view images, (ii) a target recognition network that locates a desired object and (iii) an action policy deep reinforcement learning network. Further, a planner model with cascade architecture based on multi-scale semantic top-down occupancy map input is proposed. === Dissertation/Thesis === Masters Thesis Computer Engineering 2019
author2 Zheng, Shibin (Author)
author_facet Zheng, Shibin (Author)
title Cognitive Mapping for Object Searching in Indoor Scenes
title_short Cognitive Mapping for Object Searching in Indoor Scenes
title_full Cognitive Mapping for Object Searching in Indoor Scenes
title_fullStr Cognitive Mapping for Object Searching in Indoor Scenes
title_full_unstemmed Cognitive Mapping for Object Searching in Indoor Scenes
title_sort cognitive mapping for object searching in indoor scenes
publishDate 2019
url http://hdl.handle.net/2286/R.I.55588
_version_ 1719308522723737600