Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing

Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural...

Full description

Bibliographic Details
Main Authors:	Jinpeng Mi, Jianzhi Lyu, Song Tang, Qingdu Li, Jianwei Zhang
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2020-06-01
Series:	Frontiers in Neurorobotics
Subjects:	interactive natural language grounding referring expression comprehension scene graph visual and textual semantics human-robot interaction
Online Access:	https://www.frontiersin.org/article/10.3389/fnbot.2020.00043/full

id	doaj-a05ee9689ed44a9185fe31b381da5c53
record_format	Article
spelling	doaj-a05ee9689ed44a9185fe31b381da5c532020-11-25T03:14:21ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182020-06-011410.3389/fnbot.2020.00043491799Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph ParsingJinpeng Mi0Jinpeng Mi1Jianzhi Lyu2Song Tang3Song Tang4Qingdu Li5Jianwei Zhang6Institute of Machine Intelligence (IMI), University of Shanghai for Science and Technology, Shanghai, ChinaTechnical Aspects of Multimodal Systems, Department of Informatics, University of Hamburg, Hamburg, GermanyTechnical Aspects of Multimodal Systems, Department of Informatics, University of Hamburg, Hamburg, GermanyInstitute of Machine Intelligence (IMI), University of Shanghai for Science and Technology, Shanghai, ChinaTechnical Aspects of Multimodal Systems, Department of Informatics, University of Hamburg, Hamburg, GermanyInstitute of Machine Intelligence (IMI), University of Shanghai for Science and Technology, Shanghai, ChinaTechnical Aspects of Multimodal Systems, Department of Informatics, University of Hamburg, Hamburg, GermanyNatural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural language queries and achieve target objects grounding via dialogue systems, which make the interactions cumbersome and time-consuming. In contrast, we address interactive natural language grounding without auxiliary information. Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network. Furthermore, we combine the referring expression comprehension network with scene graph parsing to achieve unrestricted and complicated natural language grounding. Finally, we validate the performance of the referring expression comprehension network on three public datasets, and we also evaluate the effectiveness of the interactive natural language grounding architecture by conducting extensive natural language query groundings in different household scenarios.https://www.frontiersin.org/article/10.3389/fnbot.2020.00043/fullinteractive natural language groundingreferring expression comprehensionscene graphvisual and textual semanticshuman-robot interaction
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jinpeng Mi Jinpeng Mi Jianzhi Lyu Song Tang Song Tang Qingdu Li Jianwei Zhang
spellingShingle	Jinpeng Mi Jinpeng Mi Jianzhi Lyu Song Tang Song Tang Qingdu Li Jianwei Zhang Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing Frontiers in Neurorobotics interactive natural language grounding referring expression comprehension scene graph visual and textual semantics human-robot interaction
author_facet	Jinpeng Mi Jinpeng Mi Jianzhi Lyu Song Tang Song Tang Qingdu Li Jianwei Zhang
author_sort	Jinpeng Mi
title	Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_short	Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_full	Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_fullStr	Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_full_unstemmed	Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing
title_sort	interactive natural language grounding via referring expression comprehension and scene graph parsing
publisher	Frontiers Media S.A.
series	Frontiers in Neurorobotics
issn	1662-5218
publishDate	2020-06-01
description	Natural language provides an intuitive and effective interaction interface between human beings and robots. Currently, multiple approaches are presented to address natural language visual grounding for human-robot interaction. However, most of the existing approaches handle the ambiguity of natural language queries and achieve target objects grounding via dialogue systems, which make the interactions cumbersome and time-consuming. In contrast, we address interactive natural language grounding without auxiliary information. Specifically, we first propose a referring expression comprehension network to ground natural referring expressions. The referring expression comprehension network excavates the visual semantics via a visual semantic-aware network, and exploits the rich linguistic contexts in expressions by a language attention network. Furthermore, we combine the referring expression comprehension network with scene graph parsing to achieve unrestricted and complicated natural language grounding. Finally, we validate the performance of the referring expression comprehension network on three public datasets, and we also evaluate the effectiveness of the interactive natural language grounding architecture by conducting extensive natural language query groundings in different household scenarios.
topic	interactive natural language grounding referring expression comprehension scene graph visual and textual semantics human-robot interaction
url	https://www.frontiersin.org/article/10.3389/fnbot.2020.00043/full
work_keys_str_mv	AT jinpengmi interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing AT jinpengmi interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing AT jianzhilyu interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing AT songtang interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing AT songtang interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing AT qingduli interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing AT jianweizhang interactivenaturallanguagegroundingviareferringexpressioncomprehensionandscenegraphparsing
_version_	1724643066959101952

Interactive Natural Language Grounding via Referring Expression Comprehension and Scene Graph Parsing

Similar Items