Chinese Grammatical Error Detection and Classification

碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination...

Full description

Bibliographic Details
Main Authors: Chen, Shao-Heng, 陳邵亨
Other Authors: Lin, Chuan-Jie
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/55054375170202790828
id ndltd-TW-104NTOU5394044
record_format oai_dc
spelling ndltd-TW-104NTOU53940442017-09-24T04:40:47Z http://ndltd.ncl.edu.tw/handle/55054375170202790828 Chinese Grammatical Error Detection and Classification 中文文法錯誤偵測與分類 Chen, Shao-Heng 陳邵亨 碩士 國立臺灣海洋大學 資訊工程學系 104 This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination were proposed. The experimental data were provided by NLP-TEA2 CGED Task, which contain sentences with their error types, error positions, and their corrections. There is only one error in one sentence. Candidates of correct sentences are generated according to different error types by different methods. In the original sentence given by the user, substrings with length of 1 ~ 2 characters are removed to generate Redundant-type candidates; frequent missing words are inserted to generate Missing-type candidates; sequences of words are moved to another position to generate Disorder-type candidates; synonyms and function words in the same classes are replaced to generate Selection-type candidates. By these methods, 100% correct sentences in Redundant type, 64.67% in Missing type, 88.88% in Disorder type, and 31.33% in Selection type can be generated. Several functions to estimate the sentence generation score were also proposed. Frequencies of words and substrings were used provided in many different linguistic resources. Different weightings and normalizations have been tested to find the best function. The experimental results show that using frequencies of substrings provided by Google N-gram dataset achieves the best performance, where the F-measures in error detection, error classification, and error position determination are 67.15%, 16.81%, and 10.34%. Lin, Chuan-Jie 林川傑 2016 學位論文 ; thesis 37 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination were proposed. The experimental data were provided by NLP-TEA2 CGED Task, which contain sentences with their error types, error positions, and their corrections. There is only one error in one sentence. Candidates of correct sentences are generated according to different error types by different methods. In the original sentence given by the user, substrings with length of 1 ~ 2 characters are removed to generate Redundant-type candidates; frequent missing words are inserted to generate Missing-type candidates; sequences of words are moved to another position to generate Disorder-type candidates; synonyms and function words in the same classes are replaced to generate Selection-type candidates. By these methods, 100% correct sentences in Redundant type, 64.67% in Missing type, 88.88% in Disorder type, and 31.33% in Selection type can be generated. Several functions to estimate the sentence generation score were also proposed. Frequencies of words and substrings were used provided in many different linguistic resources. Different weightings and normalizations have been tested to find the best function. The experimental results show that using frequencies of substrings provided by Google N-gram dataset achieves the best performance, where the F-measures in error detection, error classification, and error position determination are 67.15%, 16.81%, and 10.34%.
author2 Lin, Chuan-Jie
author_facet Lin, Chuan-Jie
Chen, Shao-Heng
陳邵亨
author Chen, Shao-Heng
陳邵亨
spellingShingle Chen, Shao-Heng
陳邵亨
Chinese Grammatical Error Detection and Classification
author_sort Chen, Shao-Heng
title Chinese Grammatical Error Detection and Classification
title_short Chinese Grammatical Error Detection and Classification
title_full Chinese Grammatical Error Detection and Classification
title_fullStr Chinese Grammatical Error Detection and Classification
title_full_unstemmed Chinese Grammatical Error Detection and Classification
title_sort chinese grammatical error detection and classification
publishDate 2016
url http://ndltd.ncl.edu.tw/handle/55054375170202790828
work_keys_str_mv AT chenshaoheng chinesegrammaticalerrordetectionandclassification
AT chénshàohēng chinesegrammaticalerrordetectionandclassification
AT chenshaoheng zhōngwénwénfǎcuòwùzhēncèyǔfēnlèi
AT chénshàohēng zhōngwénwénfǎcuòwùzhēncèyǔfēnlèi
_version_ 1718540223260917760