Leveraging Dominant Language Image Tags for Automatic Image Annotation in Minor Languages

Image annotations, often in the form of tags, are very useful when indexing large image collections. They provide an intuitive human centered way to search and browse images using text queries. However, tagging images is very time consuming to do manually so researchers have developed methods for au...

Full description

Bibliographic Details
Main Author: Wennerström, Hjalmar
Format: Others
Language:English
Published: Uppsala universitet, Institutionen för informationsteknologi 2010
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-129446
id ndltd-UPSALLA1-oai-DiVA.org-uu-129446
record_format oai_dc
spelling ndltd-UPSALLA1-oai-DiVA.org-uu-1294462013-01-08T13:49:21ZLeveraging Dominant Language Image Tags for Automatic Image Annotation in Minor LanguagesengWennerström, HjalmarUppsala universitet, Institutionen för informationsteknologi2010Image annotations, often in the form of tags, are very useful when indexing large image collections. They provide an intuitive human centered way to search and browse images using text queries. However, tagging images is very time consuming to do manually so researchers have developed methods for automatic image tagging. These methods rely on a set of example images with tags to learn what images should be associated with which tags. One thing that has been overlooked with these systems is the fact that example images with tags are different in each language. Generally researchers have only made English automatic tagging systems and not considered the problems of building equally good systems in other minor languages where it is more difficult to obtain example images and tags. In this thesis we study how an automatic tagging system in Japanese compares to an automatic tagging system in English. We find that the Japanese system suffers in performance and based on this we improve the performance by leveraging the dominant English language system. We compare an automatic translation of the tags using a dictionary to our proposed translation matrix method. Our method estimates the translation of tags based on the co-occurrence of different language tags in images. We show that our proposed method using very simple heuristics performs about the same as a high end machine translator in the case of automatic tagging systems. There are several improvements to be made but with this work we show that the conceptual idea is strong, giving reasons to improve it further. The main contribution of our approach is the ability to translate words that a dictionary cannot interpret as well as considering the context when establishing a translation. Student thesisinfo:eu-repo/semantics/bachelorThesistexthttp://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-129446UPTEC IT, 1401-5749 ; 10 013application/pdfinfo:eu-repo/semantics/openAccess
collection NDLTD
language English
format Others
sources NDLTD
description Image annotations, often in the form of tags, are very useful when indexing large image collections. They provide an intuitive human centered way to search and browse images using text queries. However, tagging images is very time consuming to do manually so researchers have developed methods for automatic image tagging. These methods rely on a set of example images with tags to learn what images should be associated with which tags. One thing that has been overlooked with these systems is the fact that example images with tags are different in each language. Generally researchers have only made English automatic tagging systems and not considered the problems of building equally good systems in other minor languages where it is more difficult to obtain example images and tags. In this thesis we study how an automatic tagging system in Japanese compares to an automatic tagging system in English. We find that the Japanese system suffers in performance and based on this we improve the performance by leveraging the dominant English language system. We compare an automatic translation of the tags using a dictionary to our proposed translation matrix method. Our method estimates the translation of tags based on the co-occurrence of different language tags in images. We show that our proposed method using very simple heuristics performs about the same as a high end machine translator in the case of automatic tagging systems. There are several improvements to be made but with this work we show that the conceptual idea is strong, giving reasons to improve it further. The main contribution of our approach is the ability to translate words that a dictionary cannot interpret as well as considering the context when establishing a translation.
author Wennerström, Hjalmar
spellingShingle Wennerström, Hjalmar
Leveraging Dominant Language Image Tags for Automatic Image Annotation in Minor Languages
author_facet Wennerström, Hjalmar
author_sort Wennerström, Hjalmar
title Leveraging Dominant Language Image Tags for Automatic Image Annotation in Minor Languages
title_short Leveraging Dominant Language Image Tags for Automatic Image Annotation in Minor Languages
title_full Leveraging Dominant Language Image Tags for Automatic Image Annotation in Minor Languages
title_fullStr Leveraging Dominant Language Image Tags for Automatic Image Annotation in Minor Languages
title_full_unstemmed Leveraging Dominant Language Image Tags for Automatic Image Annotation in Minor Languages
title_sort leveraging dominant language image tags for automatic image annotation in minor languages
publisher Uppsala universitet, Institutionen för informationsteknologi
publishDate 2010
url http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-129446
work_keys_str_mv AT wennerstromhjalmar leveragingdominantlanguageimagetagsforautomaticimageannotationinminorlanguages
_version_ 1716530055953252353