Multilingual Geo-parsing Based on Free Wiki World Map

碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize tex...

Full description

Bibliographic Details
Main Authors: Huang, Yu-Ling, 黃郁菱
Other Authors: Huang, Chun-Ying
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/spp76q
id ndltd-TW-104NTOU5394036
record_format oai_dc
spelling ndltd-TW-104NTOU53940362019-05-15T23:00:45Z http://ndltd.ncl.edu.tw/handle/spp76q Multilingual Geo-parsing Based on Free Wiki World Map 基於群眾地理資料之多語言地名標記 Huang, Yu-Ling 黃郁菱 碩士 國立臺灣海洋大學 資訊工程學系 104 Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize texts by using natural language processing techniques and then work with heuristic algorithms to identify geo-locations. However, these studies have to handle two critical challenges: the diversity of language and the granularity of identified geo-locations. While the former requires language-specific dictionaries or phrase databases, the coarse-granularity tagging does not fulfill users’ demand on identifying a more representative location for a given text. In this thesis, we attempt to develop a multilingual geo-tagging approach that solves the aforementioned challenges. Compared to the previous works, our approach does not rely on natural language processing technique to process inputs. Instead, we simply tokenize input texts using N-gram approach and then recognize geo-locations based on crowd contributed geographic map data. We further improve the granularity of our approach by considering additional geographic phrase features such as the length, the area size, and the relationships between candidate phrases. Based on these novel features, our approach is able to precisely identify representative locations for input texts of different languages without a dictionary. We evaluate our approach by using texts crawled from news websites, and the experiment results show that our proposed approach has achieved 96% and 92% correctness in Chinese and Japanese, respectively. Huang, Chun-Ying Ma, Shang-Pin 黃俊穎 馬尚彬 2016 學位論文 ; thesis 32 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize texts by using natural language processing techniques and then work with heuristic algorithms to identify geo-locations. However, these studies have to handle two critical challenges: the diversity of language and the granularity of identified geo-locations. While the former requires language-specific dictionaries or phrase databases, the coarse-granularity tagging does not fulfill users’ demand on identifying a more representative location for a given text. In this thesis, we attempt to develop a multilingual geo-tagging approach that solves the aforementioned challenges. Compared to the previous works, our approach does not rely on natural language processing technique to process inputs. Instead, we simply tokenize input texts using N-gram approach and then recognize geo-locations based on crowd contributed geographic map data. We further improve the granularity of our approach by considering additional geographic phrase features such as the length, the area size, and the relationships between candidate phrases. Based on these novel features, our approach is able to precisely identify representative locations for input texts of different languages without a dictionary. We evaluate our approach by using texts crawled from news websites, and the experiment results show that our proposed approach has achieved 96% and 92% correctness in Chinese and Japanese, respectively.
author2 Huang, Chun-Ying
author_facet Huang, Chun-Ying
Huang, Yu-Ling
黃郁菱
author Huang, Yu-Ling
黃郁菱
spellingShingle Huang, Yu-Ling
黃郁菱
Multilingual Geo-parsing Based on Free Wiki World Map
author_sort Huang, Yu-Ling
title Multilingual Geo-parsing Based on Free Wiki World Map
title_short Multilingual Geo-parsing Based on Free Wiki World Map
title_full Multilingual Geo-parsing Based on Free Wiki World Map
title_fullStr Multilingual Geo-parsing Based on Free Wiki World Map
title_full_unstemmed Multilingual Geo-parsing Based on Free Wiki World Map
title_sort multilingual geo-parsing based on free wiki world map
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/spp76q
work_keys_str_mv AT huangyuling multilingualgeoparsingbasedonfreewikiworldmap
AT huángyùlíng multilingualgeoparsingbasedonfreewikiworldmap
AT huangyuling jīyúqúnzhòngdelǐzīliàozhīduōyǔyándemíngbiāojì
AT huángyùlíng jīyúqúnzhòngdelǐzīliàozhīduōyǔyándemíngbiāojì
_version_ 1719138397579116544