Chinese Anaphora Resolution Based on Weight Learning and Knowledge Acquisition
博士 === 國立交通大學 === 資訊科學與工程研究所 === 99 === Anaphora is a commonly observed linguistic phenomenon and used to avoid repetition of expressions in discourses. Anaphora resolution denotes the process of identifying the antecedent of an anaphor in a context. Effective anaphora resolution plays an essential...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2010
|
Online Access: | http://ndltd.ncl.edu.tw/handle/17517841566588913082 |
id |
ndltd-TW-099NCTU5394050 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-099NCTU53940502016-04-08T04:22:00Z http://ndltd.ncl.edu.tw/handle/17517841566588913082 Chinese Anaphora Resolution Based on Weight Learning and Knowledge Acquisition 以權重學習與知識擷取為基礎之中文指代消解研究 Wu, Dian-Song 吳典松 博士 國立交通大學 資訊科學與工程研究所 99 Anaphora is a commonly observed linguistic phenomenon and used to avoid repetition of expressions in discourses. Anaphora resolution denotes the process of identifying the antecedent of an anaphor in a context. Effective anaphora resolution plays an essential role in many applications of natural language processing such as machine translation, summarization, and information extraction. In previous research, anaphora resolution methods have relied on syntactic rules, semantic or pragmatic clues to identify the antecedent. More recently, statistical-based or classification-based approaches are focused. However, in a rule-based approach, a salience score by manual weight assignment is usually adopted to select the antecedent. Errors may occur due to intuitive observations and subjective biases in selecting feature weight. On the other hand, the drawback of a classification-based approach is that it considers different candidates for the same anaphor independently. Thus it cannot effectively capture the preference relationships between competing candidates during resolution. To overcome these problems, we propose Chinese anaphora resolution methods based on weight learning and knowledge acquisition. In this thesis, pronominal, zero, and definite anaphora in Chinese texts are addressed and different approaches are presented. We use lexical knowledge acquisition and salience measurement to resolve Chinese pronominal anaphora. The lexical knowledge acquisition is aimed to extract more semantic features, such as gender, number, and collocate compatibility. The presented salience measurement is based on entropy-based weighting on selecting antecedent candidates. The experimental results show that our proposed approach yields 82.5% success rate on 1343 anaphoric instances, enhancing 7% improvement while compared with the general rule-based approach presented. As to Chinese zero anaphora, we apply case-based reasoning and pattern conceptualization to overcome the difficulties of constructing proper reasoning mechanisms and insufficiency of lexical features. The experimental results show that our proposed approach achieved competitive resolution by yielding 79% F-score on 1051 anaphoric instances and yielded 13% improvement while compared with the general rule-based approach. We use two strategies to resolve Chinese definite anaphors. One is an adaptive weight salience measurement in such a way that the entire set of candidates can be estimated simultaneously. Another scheme is a Web-based knowledge acquisition model so that semantic compatibility extraction and multiple resources can be employed. The experimental results show that our proposed approach yields 72.5% success rate on 426 anaphoric instances, enhancing 4.7% improvement while compared with the result conducted by a conventional classifier. Liang, Tyne 梁婷 2010 學位論文 ; thesis 78 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
博士 === 國立交通大學 === 資訊科學與工程研究所 === 99 === Anaphora is a commonly observed linguistic phenomenon and used to avoid repetition of expressions in discourses. Anaphora resolution denotes the process of identifying the antecedent of an anaphor in a context. Effective anaphora resolution plays an essential role in many applications of natural language processing such as machine translation, summarization, and information extraction.
In previous research, anaphora resolution methods have relied on syntactic rules, semantic or pragmatic clues to identify the antecedent. More recently, statistical-based or classification-based approaches are focused. However, in a rule-based approach, a salience score by manual weight assignment is usually adopted to select the antecedent. Errors may occur due to intuitive observations and subjective biases in selecting feature weight. On the other hand, the drawback of a classification-based approach is that it considers different candidates for the same anaphor independently. Thus it cannot effectively capture the preference relationships between competing candidates during resolution. To overcome these problems, we propose Chinese anaphora resolution methods based on weight learning and knowledge acquisition.
In this thesis, pronominal, zero, and definite anaphora in Chinese texts are addressed and different approaches are presented. We use lexical knowledge acquisition and salience measurement to resolve Chinese pronominal anaphora. The lexical knowledge acquisition is aimed to extract more semantic features, such as gender, number, and collocate compatibility. The presented salience measurement is based on entropy-based weighting on selecting antecedent candidates. The experimental results show that our proposed approach yields 82.5% success rate on 1343 anaphoric instances, enhancing 7% improvement while compared with the general rule-based approach presented.
As to Chinese zero anaphora, we apply case-based reasoning and pattern conceptualization to overcome the difficulties of constructing proper reasoning mechanisms and insufficiency of lexical features. The experimental results show that our proposed approach achieved competitive resolution by yielding 79% F-score on 1051 anaphoric instances and yielded 13% improvement while compared with the general rule-based approach.
We use two strategies to resolve Chinese definite anaphors. One is an adaptive weight salience measurement in such a way that the entire set of candidates can be estimated simultaneously. Another scheme is a Web-based knowledge acquisition model so that semantic compatibility extraction and multiple resources can be employed. The experimental results show that our proposed approach yields 72.5% success rate on 426 anaphoric instances, enhancing 4.7% improvement while compared with the result conducted by a conventional classifier.
|
author2 |
Liang, Tyne |
author_facet |
Liang, Tyne Wu, Dian-Song 吳典松 |
author |
Wu, Dian-Song 吳典松 |
spellingShingle |
Wu, Dian-Song 吳典松 Chinese Anaphora Resolution Based on Weight Learning and Knowledge Acquisition |
author_sort |
Wu, Dian-Song |
title |
Chinese Anaphora Resolution Based on Weight Learning and Knowledge Acquisition |
title_short |
Chinese Anaphora Resolution Based on Weight Learning and Knowledge Acquisition |
title_full |
Chinese Anaphora Resolution Based on Weight Learning and Knowledge Acquisition |
title_fullStr |
Chinese Anaphora Resolution Based on Weight Learning and Knowledge Acquisition |
title_full_unstemmed |
Chinese Anaphora Resolution Based on Weight Learning and Knowledge Acquisition |
title_sort |
chinese anaphora resolution based on weight learning and knowledge acquisition |
publishDate |
2010 |
url |
http://ndltd.ncl.edu.tw/handle/17517841566588913082 |
work_keys_str_mv |
AT wudiansong chineseanaphoraresolutionbasedonweightlearningandknowledgeacquisition AT wúdiǎnsōng chineseanaphoraresolutionbasedonweightlearningandknowledgeacquisition AT wudiansong yǐquánzhòngxuéxíyǔzhīshíxiéqǔwèijīchǔzhīzhōngwénzhǐdàixiāojiěyánjiū AT wúdiǎnsōng yǐquánzhòngxuéxíyǔzhīshíxiéqǔwèijīchǔzhīzhōngwénzhǐdàixiāojiěyánjiū |
_version_ |
1718219213423771648 |