Structure-Aware Procedural Text Generation From an Image Sequence

It is an important activity for our society to create new value by combining materials. From daily cooking to manufacturing for industry, we often describe the way to do it as a procedural text. As pointed by some previous studies for natural language understanding, one important property of the pro...

Full description

Bibliographic Details
Main Authors: Taichi Nishimura, Atsushi Hashimoto, Yoshitaka Ushiku, Hirotaka Kameko, Yoko Yamakata, Shinsuke Mori
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9288722/
id doaj-14cc64f163d14b049126181eb601638b
record_format Article
spelling doaj-14cc64f163d14b049126181eb601638b2021-03-30T14:57:53ZengIEEEIEEE Access2169-35362021-01-0192125214110.1109/ACCESS.2020.30434529288722Structure-Aware Procedural Text Generation From an Image SequenceTaichi Nishimura0https://orcid.org/0000-0001-8725-7164Atsushi Hashimoto1https://orcid.org/0000-0002-0799-4269Yoshitaka Ushiku2https://orcid.org/0000-0002-9014-1389Hirotaka Kameko3https://orcid.org/0000-0001-9844-6198Yoko Yamakata4Shinsuke Mori5Graduate School of Informatics, Kyoto University, Kyoto, JapanOMRON SINIC X Corporation, Tokyo, JapanOMRON SINIC X Corporation, Tokyo, JapanAcademic Center for Computing and Media Studies, Kyoto University, Kyoto, JapanGraduate School of Information Science and Technology, The University of Tokyo, Tokyo, JapanAcademic Center for Computing and Media Studies, Kyoto University, Kyoto, JapanIt is an important activity for our society to create new value by combining materials. From daily cooking to manufacturing for industry, we often describe the way to do it as a procedural text. As pointed by some previous studies for natural language understanding, one important property of the procedural text is its dependency of the context, which is the merging operations of materials and can be represented by a graph or tree structure. This paper aims to investigate the impact of explicitly introducing such a structure on the vision and language task of procedural text generation from an image sequence. To this end, we propose (1) a new dataset, which extends a definition of a tree structure merging tree to a vision and language version and (2) a novel structure-aware procedural text generation model, which learns the context dependency efficiently. Experimental results show that the proposed method can boost the performance of traditional versatile methods.https://ieeexplore.ieee.org/document/9288722/Natural language processingtext generationprocedural textvision and language
collection DOAJ
language English
format Article
sources DOAJ
author Taichi Nishimura
Atsushi Hashimoto
Yoshitaka Ushiku
Hirotaka Kameko
Yoko Yamakata
Shinsuke Mori
spellingShingle Taichi Nishimura
Atsushi Hashimoto
Yoshitaka Ushiku
Hirotaka Kameko
Yoko Yamakata
Shinsuke Mori
Structure-Aware Procedural Text Generation From an Image Sequence
IEEE Access
Natural language processing
text generation
procedural text
vision and language
author_facet Taichi Nishimura
Atsushi Hashimoto
Yoshitaka Ushiku
Hirotaka Kameko
Yoko Yamakata
Shinsuke Mori
author_sort Taichi Nishimura
title Structure-Aware Procedural Text Generation From an Image Sequence
title_short Structure-Aware Procedural Text Generation From an Image Sequence
title_full Structure-Aware Procedural Text Generation From an Image Sequence
title_fullStr Structure-Aware Procedural Text Generation From an Image Sequence
title_full_unstemmed Structure-Aware Procedural Text Generation From an Image Sequence
title_sort structure-aware procedural text generation from an image sequence
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description It is an important activity for our society to create new value by combining materials. From daily cooking to manufacturing for industry, we often describe the way to do it as a procedural text. As pointed by some previous studies for natural language understanding, one important property of the procedural text is its dependency of the context, which is the merging operations of materials and can be represented by a graph or tree structure. This paper aims to investigate the impact of explicitly introducing such a structure on the vision and language task of procedural text generation from an image sequence. To this end, we propose (1) a new dataset, which extends a definition of a tree structure merging tree to a vision and language version and (2) a novel structure-aware procedural text generation model, which learns the context dependency efficiently. Experimental results show that the proposed method can boost the performance of traditional versatile methods.
topic Natural language processing
text generation
procedural text
vision and language
url https://ieeexplore.ieee.org/document/9288722/
work_keys_str_mv AT taichinishimura structureawareproceduraltextgenerationfromanimagesequence
AT atsushihashimoto structureawareproceduraltextgenerationfromanimagesequence
AT yoshitakaushiku structureawareproceduraltextgenerationfromanimagesequence
AT hirotakakameko structureawareproceduraltextgenerationfromanimagesequence
AT yokoyamakata structureawareproceduraltextgenerationfromanimagesequence
AT shinsukemori structureawareproceduraltextgenerationfromanimagesequence
_version_ 1724180183975460864