Summary: | 博士 === 國立交通大學 === 資訊科學與工程研究所 === 99 === Anaphora is a commonly observed linguistic phenomenon and used to avoid repetition of expressions in discourses. Anaphora resolution denotes the process of identifying the antecedent of an anaphor in a context. Effective anaphora resolution plays an essential role in many applications of natural language processing such as machine translation, summarization, and information extraction.
In previous research, anaphora resolution methods have relied on syntactic rules, semantic or pragmatic clues to identify the antecedent. More recently, statistical-based or classification-based approaches are focused. However, in a rule-based approach, a salience score by manual weight assignment is usually adopted to select the antecedent. Errors may occur due to intuitive observations and subjective biases in selecting feature weight. On the other hand, the drawback of a classification-based approach is that it considers different candidates for the same anaphor independently. Thus it cannot effectively capture the preference relationships between competing candidates during resolution. To overcome these problems, we propose Chinese anaphora resolution methods based on weight learning and knowledge acquisition.
In this thesis, pronominal, zero, and definite anaphora in Chinese texts are addressed and different approaches are presented. We use lexical knowledge acquisition and salience measurement to resolve Chinese pronominal anaphora. The lexical knowledge acquisition is aimed to extract more semantic features, such as gender, number, and collocate compatibility. The presented salience measurement is based on entropy-based weighting on selecting antecedent candidates. The experimental results show that our proposed approach yields 82.5% success rate on 1343 anaphoric instances, enhancing 7% improvement while compared with the general rule-based approach presented.
As to Chinese zero anaphora, we apply case-based reasoning and pattern conceptualization to overcome the difficulties of constructing proper reasoning mechanisms and insufficiency of lexical features. The experimental results show that our proposed approach achieved competitive resolution by yielding 79% F-score on 1051 anaphoric instances and yielded 13% improvement while compared with the general rule-based approach.
We use two strategies to resolve Chinese definite anaphors. One is an adaptive weight salience measurement in such a way that the entire set of candidates can be estimated simultaneously. Another scheme is a Web-based knowledge acquisition model so that semantic compatibility extraction and multiple resources can be employed. The experimental results show that our proposed approach yields 72.5% success rate on 426 anaphoric instances, enhancing 4.7% improvement while compared with the result conducted by a conventional classifier.
|