英中詞彙知識庫建構機制之研究
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 88 === This paper proposes a sense tagger for Mandarin Chinese. Using contextual information and the mapping from WordNet synsets to Cilin sense tags to deal with word sense disambiguation. The performance for tagging low(2-4), middle(5-8) and high(>8) ambiguous wor...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2000
|
Online Access: | http://ndltd.ncl.edu.tw/handle/34887029646704875197 |
id |
ndltd-TW-088NTU00392021 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-088NTU003920212016-01-29T04:18:37Z http://ndltd.ncl.edu.tw/handle/34887029646704875197 英中詞彙知識庫建構機制之研究 Chi-Ching Lin 林其青 碩士 國立臺灣大學 資訊工程學研究所 88 This paper proposes a sense tagger for Mandarin Chinese. Using contextual information and the mapping from WordNet synsets to Cilin sense tags to deal with word sense disambiguation. The performance for tagging low(2-4), middle(5-8) and high(>8) ambiguous words is 63.36% in average, when small categories(1428 senses) are used and 1-3 candidates are proposed, respectively. The performance of tagging unknown words is 34.35%, which is better than that of the baseline model. This sense tagger helps us set up a large-scale sense-tagged corpus from ASBC. This paper also proposes a method to construct Chinese-English WordNet automatically. According to the word senses, Chinese words are mapped to the WordNet synsets. Besides the mapping between Chinese Cilin senses and English WordNet synsets is built, we also set up a Chinese lexical knowlege base. The results are applied to Chinese-English information retrieval. When the Chinese-English WordNet is applied to our CLIR experiment, it achieves 69.7% of monolingual IR effectiveness. Hsin-Hsi Chen 陳信希 2000 學位論文 ; thesis 71 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 88 === This paper proposes a sense tagger for Mandarin Chinese. Using contextual information and the mapping from WordNet synsets to Cilin sense tags to deal with word sense disambiguation. The performance for tagging low(2-4), middle(5-8) and high(>8) ambiguous words is 63.36% in average, when small categories(1428 senses) are used and 1-3 candidates are proposed, respectively. The performance of tagging unknown words is 34.35%, which is better than that of the baseline model. This sense tagger helps us set up a large-scale sense-tagged corpus from ASBC.
This paper also proposes a method to construct Chinese-English WordNet automatically. According to the word senses, Chinese words are mapped to the WordNet synsets. Besides the mapping between Chinese Cilin senses and English WordNet synsets is built, we also set up a Chinese lexical knowlege base. The results are applied to Chinese-English information retrieval. When the Chinese-English WordNet is applied to our CLIR experiment, it achieves 69.7% of monolingual IR effectiveness.
|
author2 |
Hsin-Hsi Chen |
author_facet |
Hsin-Hsi Chen Chi-Ching Lin 林其青 |
author |
Chi-Ching Lin 林其青 |
spellingShingle |
Chi-Ching Lin 林其青 英中詞彙知識庫建構機制之研究 |
author_sort |
Chi-Ching Lin |
title |
英中詞彙知識庫建構機制之研究 |
title_short |
英中詞彙知識庫建構機制之研究 |
title_full |
英中詞彙知識庫建構機制之研究 |
title_fullStr |
英中詞彙知識庫建構機制之研究 |
title_full_unstemmed |
英中詞彙知識庫建構機制之研究 |
title_sort |
英中詞彙知識庫建構機制之研究 |
publishDate |
2000 |
url |
http://ndltd.ncl.edu.tw/handle/34887029646704875197 |
work_keys_str_mv |
AT chichinglin yīngzhōngcíhuìzhīshíkùjiàngòujīzhìzhīyánjiū AT línqíqīng yīngzhōngcíhuìzhīshíkùjiàngòujīzhìzhīyánjiū |
_version_ |
1718167354533216256 |