Multilingual Geo-parsing Based on Free Wiki World Map

碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize tex...

Full description

Bibliographic Details
Main Authors:	Huang, Yu-Ling, 黃郁菱
Other Authors:	Huang, Chun-Ying
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/spp76q

id	ndltd-TW-104NTOU5394036
record_format	oai_dc
spelling	ndltd-TW-104NTOU53940362019-05-15T23:00:45Z http://ndltd.ncl.edu.tw/handle/spp76q Multilingual Geo-parsing Based on Free Wiki World Map 基於群眾地理資料之多語言地名標記 Huang, Yu-Ling 黃郁菱碩士國立臺灣海洋大學資訊工程學系 104 Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize texts by using natural language processing techniques and then work with heuristic algorithms to identify geo-locations. However, these studies have to handle two critical challenges: the diversity of language and the granularity of identified geo-locations. While the former requires language-specific dictionaries or phrase databases, the coarse-granularity tagging does not fulfill users’ demand on identifying a more representative location for a given text. In this thesis, we attempt to develop a multilingual geo-tagging approach that solves the aforementioned challenges. Compared to the previous works, our approach does not rely on natural language processing technique to process inputs. Instead, we simply tokenize input texts using N-gram approach and then recognize geo-locations based on crowd contributed geographic map data. We further improve the granularity of our approach by considering additional geographic phrase features such as the length, the area size, and the relationships between candidate phrases. Based on these novel features, our approach is able to precisely identify representative locations for input texts of different languages without a dictionary. We evaluate our approach by using texts crawled from news websites, and the experiment results show that our proposed approach has achieved 96% and 92% correctness in Chinese and Japanese, respectively. Huang, Chun-Ying Ma, Shang-Pin 黃俊穎馬尚彬 2016 學位論文 ; thesis 32 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize texts by using natural language processing techniques and then work with heuristic algorithms to identify geo-locations. However, these studies have to handle two critical challenges: the diversity of language and the granularity of identified geo-locations. While the former requires language-specific dictionaries or phrase databases, the coarse-granularity tagging does not fulfill users’ demand on identifying a more representative location for a given text. In this thesis, we attempt to develop a multilingual geo-tagging approach that solves the aforementioned challenges. Compared to the previous works, our approach does not rely on natural language processing technique to process inputs. Instead, we simply tokenize input texts using N-gram approach and then recognize geo-locations based on crowd contributed geographic map data. We further improve the granularity of our approach by considering additional geographic phrase features such as the length, the area size, and the relationships between candidate phrases. Based on these novel features, our approach is able to precisely identify representative locations for input texts of different languages without a dictionary. We evaluate our approach by using texts crawled from news websites, and the experiment results show that our proposed approach has achieved 96% and 92% correctness in Chinese and Japanese, respectively.
author2	Huang, Chun-Ying
author_facet	Huang, Chun-Ying Huang, Yu-Ling 黃郁菱
author	Huang, Yu-Ling 黃郁菱
spellingShingle	Huang, Yu-Ling 黃郁菱 Multilingual Geo-parsing Based on Free Wiki World Map
author_sort	Huang, Yu-Ling
title	Multilingual Geo-parsing Based on Free Wiki World Map
title_short	Multilingual Geo-parsing Based on Free Wiki World Map
title_full	Multilingual Geo-parsing Based on Free Wiki World Map
title_fullStr	Multilingual Geo-parsing Based on Free Wiki World Map
title_full_unstemmed	Multilingual Geo-parsing Based on Free Wiki World Map
title_sort	multilingual geo-parsing based on free wiki world map
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/spp76q
work_keys_str_mv	AT huangyuling multilingualgeoparsingbasedonfreewikiworldmap AT huángyùlíng multilingualgeoparsingbasedonfreewikiworldmap AT huangyuling jīyúqúnzhòngdelǐzīliàozhīduōyǔyándemíngbiāojì AT huángyùlíng jīyúqúnzhòngdelǐzīliàozhīduōyǔyándemíngbiāojì
_version_	1719138397579116544

Multilingual Geo-parsing Based on Free Wiki World Map

Similar Items