Summary: | 碩士 === 國立交通大學 === 資訊學院資訊科技(IT)產業研發碩士專班 === 99 === Anaphora is a common phenomenon in written texts, denoting the use of terms referring the mentioned entities previously. There are pronominal anaphora, zero-anaphora, and nominal anaphora in Chinese texts. The referents can be abstract or entities. In this thesis, we focus on studying definite abstract noun anaphora, and we propose a clause based anaphora resolution procedure. Furthermore, anaphora identification and feature selection are done by using CLINE, CKIP lexical and Google search results etc. The anaphora recognition achieves 90% precision using finite state machine in 1538 instances. Furthermore, we extract four types of features to classify candidate antecedents including position features, distance features, lexicon features and semantic features. These features are used for building SVM classifiers and weighted model on resolving anaphora. The best features set are found by a genetic algorithm. In 241 definite anaphora instances, the SVM classify achieves 40.66% on correct clause position and 68.46% on correct sentence position. The weighted method achieves 42.32% on correct clause position and 70.54% on correct sentence position.
|