Visual Thinking of Neural Networks: Interactive Text to Image Synthesis

Reasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal repr...

Full description

Bibliographic Details
Main Authors: Hyunhee Lee, Gyeongmin Kim, Yuna Hur, Heuiseok Lim
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9410550/
id doaj-65ae7a5071a244538744f902c6749d1d
record_format Article
spelling doaj-65ae7a5071a244538744f902c6749d1d2021-04-30T23:01:12ZengIEEEIEEE Access2169-35362021-01-019645106452310.1109/ACCESS.2021.30749739410550Visual Thinking of Neural Networks: Interactive Text to Image SynthesisHyunhee Lee0https://orcid.org/0000-0003-3540-776XGyeongmin Kim1https://orcid.org/0000-0002-2851-0374Yuna Hur2https://orcid.org/0000-0001-5997-1627Heuiseok Lim3https://orcid.org/0000-0002-9269-1157Department of Computer Science and Engineering, Korea University, Seoul, South KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, South KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, South KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, South KoreaReasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal representations between the image and text features. Hence, it requires a high-level reasoning ability that understands the relationships between objects in the given text and generates high-quality images based on the understanding. Text-to-image translation can be termed as the visual thinking of neural networks. In this study, our model infers the complicated relationships between objects in the given text and generates the final image by leveraging the previous history. We define diverse novel adversarial loss functions and finally demonstrate the best one that elevates the reasoning ability of the text-to-image synthesis. Remarkably, most of our models possess their own reasoning ability. Quantitative and qualitative comparisons with several methods demonstrate the superiority of our approach.https://ieeexplore.ieee.org/document/9410550/Generative adversarial networksimage generationmultimodal learningmultimodal representationtext-to-image synthesis
collection DOAJ
language English
format Article
sources DOAJ
author Hyunhee Lee
Gyeongmin Kim
Yuna Hur
Heuiseok Lim
spellingShingle Hyunhee Lee
Gyeongmin Kim
Yuna Hur
Heuiseok Lim
Visual Thinking of Neural Networks: Interactive Text to Image Synthesis
IEEE Access
Generative adversarial networks
image generation
multimodal learning
multimodal representation
text-to-image synthesis
author_facet Hyunhee Lee
Gyeongmin Kim
Yuna Hur
Heuiseok Lim
author_sort Hyunhee Lee
title Visual Thinking of Neural Networks: Interactive Text to Image Synthesis
title_short Visual Thinking of Neural Networks: Interactive Text to Image Synthesis
title_full Visual Thinking of Neural Networks: Interactive Text to Image Synthesis
title_fullStr Visual Thinking of Neural Networks: Interactive Text to Image Synthesis
title_full_unstemmed Visual Thinking of Neural Networks: Interactive Text to Image Synthesis
title_sort visual thinking of neural networks: interactive text to image synthesis
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Reasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal representations between the image and text features. Hence, it requires a high-level reasoning ability that understands the relationships between objects in the given text and generates high-quality images based on the understanding. Text-to-image translation can be termed as the visual thinking of neural networks. In this study, our model infers the complicated relationships between objects in the given text and generates the final image by leveraging the previous history. We define diverse novel adversarial loss functions and finally demonstrate the best one that elevates the reasoning ability of the text-to-image synthesis. Remarkably, most of our models possess their own reasoning ability. Quantitative and qualitative comparisons with several methods demonstrate the superiority of our approach.
topic Generative adversarial networks
image generation
multimodal learning
multimodal representation
text-to-image synthesis
url https://ieeexplore.ieee.org/document/9410550/
work_keys_str_mv AT hyunheelee visualthinkingofneuralnetworksinteractivetexttoimagesynthesis
AT gyeongminkim visualthinkingofneuralnetworksinteractivetexttoimagesynthesis
AT yunahur visualthinkingofneuralnetworksinteractivetexttoimagesynthesis
AT heuiseoklim visualthinkingofneuralnetworksinteractivetexttoimagesynthesis
_version_ 1721497298084560896