Multilingual Geo-parsing Based on Free Wiki World Map
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize tex...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/spp76q |
id |
ndltd-TW-104NTOU5394036 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-104NTOU53940362019-05-15T23:00:45Z http://ndltd.ncl.edu.tw/handle/spp76q Multilingual Geo-parsing Based on Free Wiki World Map 基於群眾地理資料之多語言地名標記 Huang, Yu-Ling 黃郁菱 碩士 國立臺灣海洋大學 資訊工程學系 104 Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize texts by using natural language processing techniques and then work with heuristic algorithms to identify geo-locations. However, these studies have to handle two critical challenges: the diversity of language and the granularity of identified geo-locations. While the former requires language-specific dictionaries or phrase databases, the coarse-granularity tagging does not fulfill users’ demand on identifying a more representative location for a given text. In this thesis, we attempt to develop a multilingual geo-tagging approach that solves the aforementioned challenges. Compared to the previous works, our approach does not rely on natural language processing technique to process inputs. Instead, we simply tokenize input texts using N-gram approach and then recognize geo-locations based on crowd contributed geographic map data. We further improve the granularity of our approach by considering additional geographic phrase features such as the length, the area size, and the relationships between candidate phrases. Based on these novel features, our approach is able to precisely identify representative locations for input texts of different languages without a dictionary. We evaluate our approach by using texts crawled from news websites, and the experiment results show that our proposed approach has achieved 96% and 92% correctness in Chinese and Japanese, respectively. Huang, Chun-Ying Ma, Shang-Pin 黃俊穎 馬尚彬 2016 學位論文 ; thesis 32 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === Retrieve representative geographic location from texts is an interesting research problem. Researchers have tried to perform geo-tagging for texts retrieved from sources such as blog posts and Twitter tweets in the past. Most of these works have to tokenize texts by using natural language processing techniques and then work with heuristic algorithms to identify geo-locations. However, these studies have to handle two critical challenges: the diversity of language and the granularity of identified geo-locations. While the former requires language-specific dictionaries or phrase databases, the coarse-granularity tagging does not fulfill users’ demand on identifying a more representative location for a given text.
In this thesis, we attempt to develop a multilingual geo-tagging approach that solves the aforementioned challenges. Compared to the previous works, our approach does not rely on natural language processing technique to process inputs. Instead, we simply tokenize input texts using N-gram approach and then recognize geo-locations based on crowd contributed geographic map data. We further improve the granularity of our approach by considering additional geographic phrase features such as the length, the area size, and the relationships between candidate phrases. Based on these novel features, our approach is able to precisely identify representative locations for input texts of different languages without a dictionary. We evaluate our approach by using texts crawled from news websites, and the experiment results show that our proposed approach has achieved 96% and 92% correctness in Chinese and Japanese, respectively.
|
author2 |
Huang, Chun-Ying |
author_facet |
Huang, Chun-Ying Huang, Yu-Ling 黃郁菱 |
author |
Huang, Yu-Ling 黃郁菱 |
spellingShingle |
Huang, Yu-Ling 黃郁菱 Multilingual Geo-parsing Based on Free Wiki World Map |
author_sort |
Huang, Yu-Ling |
title |
Multilingual Geo-parsing Based on Free Wiki World Map |
title_short |
Multilingual Geo-parsing Based on Free Wiki World Map |
title_full |
Multilingual Geo-parsing Based on Free Wiki World Map |
title_fullStr |
Multilingual Geo-parsing Based on Free Wiki World Map |
title_full_unstemmed |
Multilingual Geo-parsing Based on Free Wiki World Map |
title_sort |
multilingual geo-parsing based on free wiki world map |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/spp76q |
work_keys_str_mv |
AT huangyuling multilingualgeoparsingbasedonfreewikiworldmap AT huángyùlíng multilingualgeoparsingbasedonfreewikiworldmap AT huangyuling jīyúqúnzhòngdelǐzīliàozhīduōyǔyándemíngbiāojì AT huángyùlíng jīyúqúnzhòngdelǐzīliàozhīduōyǔyándemíngbiāojì |
_version_ |
1719138397579116544 |