Visual Thinking of Neural Networks: Interactive Text to Image Synthesis
Reasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal repr...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9410550/ |
id |
doaj-65ae7a5071a244538744f902c6749d1d |
---|---|
record_format |
Article |
spelling |
doaj-65ae7a5071a244538744f902c6749d1d2021-04-30T23:01:12ZengIEEEIEEE Access2169-35362021-01-019645106452310.1109/ACCESS.2021.30749739410550Visual Thinking of Neural Networks: Interactive Text to Image SynthesisHyunhee Lee0https://orcid.org/0000-0003-3540-776XGyeongmin Kim1https://orcid.org/0000-0002-2851-0374Yuna Hur2https://orcid.org/0000-0001-5997-1627Heuiseok Lim3https://orcid.org/0000-0002-9269-1157Department of Computer Science and Engineering, Korea University, Seoul, South KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, South KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, South KoreaDepartment of Computer Science and Engineering, Korea University, Seoul, South KoreaReasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal representations between the image and text features. Hence, it requires a high-level reasoning ability that understands the relationships between objects in the given text and generates high-quality images based on the understanding. Text-to-image translation can be termed as the visual thinking of neural networks. In this study, our model infers the complicated relationships between objects in the given text and generates the final image by leveraging the previous history. We define diverse novel adversarial loss functions and finally demonstrate the best one that elevates the reasoning ability of the text-to-image synthesis. Remarkably, most of our models possess their own reasoning ability. Quantitative and qualitative comparisons with several methods demonstrate the superiority of our approach.https://ieeexplore.ieee.org/document/9410550/Generative adversarial networksimage generationmultimodal learningmultimodal representationtext-to-image synthesis |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hyunhee Lee Gyeongmin Kim Yuna Hur Heuiseok Lim |
spellingShingle |
Hyunhee Lee Gyeongmin Kim Yuna Hur Heuiseok Lim Visual Thinking of Neural Networks: Interactive Text to Image Synthesis IEEE Access Generative adversarial networks image generation multimodal learning multimodal representation text-to-image synthesis |
author_facet |
Hyunhee Lee Gyeongmin Kim Yuna Hur Heuiseok Lim |
author_sort |
Hyunhee Lee |
title |
Visual Thinking of Neural Networks: Interactive Text to Image Synthesis |
title_short |
Visual Thinking of Neural Networks: Interactive Text to Image Synthesis |
title_full |
Visual Thinking of Neural Networks: Interactive Text to Image Synthesis |
title_fullStr |
Visual Thinking of Neural Networks: Interactive Text to Image Synthesis |
title_full_unstemmed |
Visual Thinking of Neural Networks: Interactive Text to Image Synthesis |
title_sort |
visual thinking of neural networks: interactive text to image synthesis |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Reasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal representations between the image and text features. Hence, it requires a high-level reasoning ability that understands the relationships between objects in the given text and generates high-quality images based on the understanding. Text-to-image translation can be termed as the visual thinking of neural networks. In this study, our model infers the complicated relationships between objects in the given text and generates the final image by leveraging the previous history. We define diverse novel adversarial loss functions and finally demonstrate the best one that elevates the reasoning ability of the text-to-image synthesis. Remarkably, most of our models possess their own reasoning ability. Quantitative and qualitative comparisons with several methods demonstrate the superiority of our approach. |
topic |
Generative adversarial networks image generation multimodal learning multimodal representation text-to-image synthesis |
url |
https://ieeexplore.ieee.org/document/9410550/ |
work_keys_str_mv |
AT hyunheelee visualthinkingofneuralnetworksinteractivetexttoimagesynthesis AT gyeongminkim visualthinkingofneuralnetworksinteractivetexttoimagesynthesis AT yunahur visualthinkingofneuralnetworksinteractivetexttoimagesynthesis AT heuiseoklim visualthinkingofneuralnetworksinteractivetexttoimagesynthesis |
_version_ |
1721497298084560896 |