Summary: | 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination were proposed. The experimental data were provided by NLP-TEA2 CGED Task, which contain sentences with their error types, error positions, and their corrections. There is only one error in one sentence.
Candidates of correct sentences are generated according to different error types by different methods. In the original sentence given by the user, substrings with length of 1 ~ 2 characters are removed to generate Redundant-type candidates; frequent missing words are inserted to generate Missing-type candidates; sequences of words are moved to another position to generate Disorder-type candidates; synonyms and function words in the same classes are replaced to generate Selection-type candidates. By these methods, 100% correct sentences in Redundant type, 64.67% in Missing type, 88.88% in Disorder type, and 31.33% in Selection type can be generated.
Several functions to estimate the sentence generation score were also proposed. Frequencies of words and substrings were used provided in many different linguistic resources. Different weightings and normalizations have been tested to find the best function. The experimental results show that using frequencies of substrings provided by Google N-gram dataset achieves the best performance, where the F-measures in error detection, error classification, and error position determination are 67.15%, 16.81%, and 10.34%.
|