Summary: | 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === This thesis proposed a textual entailment classification system developed based on the dataset focusing on individual entailment-related linguistic phenomena. For each text pair in this dataset, its entailment relationship label and related linguistic phenomenon are provided.
Given a sentence pair, necessary linguistic preprocessing is performed. Identical and synonymous terms in the sentences are aligned in order to find differences between the sentences. Several resources are used to define synonyms. Among them, Wikipedia provides the most useful synonym sets.
Two different kinds of textual entailment classification systems were proposed: rule-based and ML-based. Our rule-based system consists of several classification modules according the differences on the quantity, temporal, spatial, hypernymy, antonym, negation, or syntax information. The final decision is made by selecting the results from these modules in the order of contradiction, independence, forward- and bidirectional-entailment.
The experiment results show that the rules invented according to the linguistic phenomena can improve the performance of textual entailment classification. Our hybrid system achieves a macro-averaged F1-measure of 48.61% and an accuracy of 49.42%, which outperforms the best systems in the NTCIR-11 RITE-VAL System Validation Chinese subtasks.
|