Using a Character-Based Language Model for Caption Generation

Using AI to automatically describe images is a challenging task. The aim of this study has been to compare the use of character-based language models with one of the current state-of-the-art token-based language models, im2txt, to generate image captions, with focus on morphological correctness. Pre...

Full description

Bibliographic Details
Main Author:	Keisala, Simon
Format:	Others
Language:	English
Published:	Linköpings universitet, Interaktiva och kognitiva system 2019
Subjects:	Natural Language Processing NLP Machine Learning ML Neural Network Caption Generation Deep Learning Recurrent Neural Network Long-Short-Term-Memory LSTM word2vec Language Model Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling)
Online Access:	http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-163001

id	ndltd-UPSALLA1-oai-DiVA.org-liu-163001
record_format	oai_dc
spelling	ndltd-UPSALLA1-oai-DiVA.org-liu-1630012020-01-10T15:44:26ZUsing a Character-Based Language Model for Caption GenerationengAnvändning av teckenbaserad språkmodell för generering av bildtextKeisala, SimonLinköpings universitet, Interaktiva och kognitiva system2019Natural Language ProcessingNLPMachine LearningMLNeural NetworkCaption GenerationDeep LearningRecurrent Neural NetworkLong-Short-Term-MemoryLSTMword2vecLanguage ModelLanguage Technology (Computational Linguistics)Språkteknologi (språkvetenskaplig databehandling)Using AI to automatically describe images is a challenging task. The aim of this study has been to compare the use of character-based language models with one of the current state-of-the-art token-based language models, im2txt, to generate image captions, with focus on morphological correctness. Previous work has shown that character-based language models are able to outperform token-based language models in morphologically rich languages. Other studies show that simple multi-layered LSTM-blocks are able to learn to replicate the syntax of its training data. To study the usability of character-based language models an alternative model based on TensorFlow im2txt has been created. The model changes the token-generation architecture into handling character-sized tokens instead of word-sized tokens. The results suggest that a character-based language model could outperform the current token-based language models, although due to time and computing power constraints this study fails to draw a clear conclusion. A problem with one of the methods, subsampling, is discussed. When using the original method on character-sized tokens this method removes characters (including special characters) instead of full words. To solve this issue, a two-phase approach is suggested, where training data first is separated into word-sized tokens where subsampling is performed. The remaining tokens are then separated into character-sized tokens. Future work where the modified subsampling and fine-tuning of the hyperparameters are performed is suggested to gain a clearer conclusion of the performance of character-based language models. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-163001application/pdfinfo:eu-repo/semantics/openAccess
collection	NDLTD
language	English
format	Others
sources	NDLTD
topic	Natural Language Processing NLP Machine Learning ML Neural Network Caption Generation Deep Learning Recurrent Neural Network Long-Short-Term-Memory LSTM word2vec Language Model Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling)
spellingShingle	Natural Language Processing NLP Machine Learning ML Neural Network Caption Generation Deep Learning Recurrent Neural Network Long-Short-Term-Memory LSTM word2vec Language Model Language Technology (Computational Linguistics) Språkteknologi (språkvetenskaplig databehandling) Keisala, Simon Using a Character-Based Language Model for Caption Generation
description	Using AI to automatically describe images is a challenging task. The aim of this study has been to compare the use of character-based language models with one of the current state-of-the-art token-based language models, im2txt, to generate image captions, with focus on morphological correctness. Previous work has shown that character-based language models are able to outperform token-based language models in morphologically rich languages. Other studies show that simple multi-layered LSTM-blocks are able to learn to replicate the syntax of its training data. To study the usability of character-based language models an alternative model based on TensorFlow im2txt has been created. The model changes the token-generation architecture into handling character-sized tokens instead of word-sized tokens. The results suggest that a character-based language model could outperform the current token-based language models, although due to time and computing power constraints this study fails to draw a clear conclusion. A problem with one of the methods, subsampling, is discussed. When using the original method on character-sized tokens this method removes characters (including special characters) instead of full words. To solve this issue, a two-phase approach is suggested, where training data first is separated into word-sized tokens where subsampling is performed. The remaining tokens are then separated into character-sized tokens. Future work where the modified subsampling and fine-tuning of the hyperparameters are performed is suggested to gain a clearer conclusion of the performance of character-based language models.
author	Keisala, Simon
author_facet	Keisala, Simon
author_sort	Keisala, Simon
title	Using a Character-Based Language Model for Caption Generation
title_short	Using a Character-Based Language Model for Caption Generation
title_full	Using a Character-Based Language Model for Caption Generation
title_fullStr	Using a Character-Based Language Model for Caption Generation
title_full_unstemmed	Using a Character-Based Language Model for Caption Generation
title_sort	using a character-based language model for caption generation
publisher	Linköpings universitet, Interaktiva och kognitiva system
publishDate	2019
url	http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-163001
work_keys_str_mv	AT keisalasimon usingacharacterbasedlanguagemodelforcaptiongeneration AT keisalasimon anvandningavteckenbaseradsprakmodellforgenereringavbildtext
_version_	1719307935724601344

Using a Character-Based Language Model for Caption Generation

Similar Items