Chinese Grammatical Error Detection and Classification

碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination...

Full description

Bibliographic Details
Main Authors:	Chen, Shao-Heng, 陳邵亨
Other Authors:	Lin, Chuan-Jie
Format:	Others
Language:	zh-TW
Published:	2016
Online Access:	http://ndltd.ncl.edu.tw/handle/55054375170202790828

id	ndltd-TW-104NTOU5394044
record_format	oai_dc
spelling	ndltd-TW-104NTOU53940442017-09-24T04:40:47Z http://ndltd.ncl.edu.tw/handle/55054375170202790828 Chinese Grammatical Error Detection and Classification 中文文法錯誤偵測與分類 Chen, Shao-Heng 陳邵亨碩士國立臺灣海洋大學資訊工程學系 104 This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination were proposed. The experimental data were provided by NLP-TEA2 CGED Task, which contain sentences with their error types, error positions, and their corrections. There is only one error in one sentence. Candidates of correct sentences are generated according to different error types by different methods. In the original sentence given by the user, substrings with length of 1 ~ 2 characters are removed to generate Redundant-type candidates; frequent missing words are inserted to generate Missing-type candidates; sequences of words are moved to another position to generate Disorder-type candidates; synonyms and function words in the same classes are replaced to generate Selection-type candidates. By these methods, 100% correct sentences in Redundant type, 64.67% in Missing type, 88.88% in Disorder type, and 31.33% in Selection type can be generated. Several functions to estimate the sentence generation score were also proposed. Frequencies of words and substrings were used provided in many different linguistic resources. Different weightings and normalizations have been tested to find the best function. The experimental results show that using frequencies of substrings provided by Google N-gram dataset achieves the best performance, where the F-measures in error detection, error classification, and error position determination are 67.15%, 16.81%, and 10.34%. Lin, Chuan-Jie 林川傑 2016 學位論文 ; thesis 37 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination were proposed. The experimental data were provided by NLP-TEA2 CGED Task, which contain sentences with their error types, error positions, and their corrections. There is only one error in one sentence. Candidates of correct sentences are generated according to different error types by different methods. In the original sentence given by the user, substrings with length of 1 ~ 2 characters are removed to generate Redundant-type candidates; frequent missing words are inserted to generate Missing-type candidates; sequences of words are moved to another position to generate Disorder-type candidates; synonyms and function words in the same classes are replaced to generate Selection-type candidates. By these methods, 100% correct sentences in Redundant type, 64.67% in Missing type, 88.88% in Disorder type, and 31.33% in Selection type can be generated. Several functions to estimate the sentence generation score were also proposed. Frequencies of words and substrings were used provided in many different linguistic resources. Different weightings and normalizations have been tested to find the best function. The experimental results show that using frequencies of substrings provided by Google N-gram dataset achieves the best performance, where the F-measures in error detection, error classification, and error position determination are 67.15%, 16.81%, and 10.34%.
author2	Lin, Chuan-Jie
author_facet	Lin, Chuan-Jie Chen, Shao-Heng 陳邵亨
author	Chen, Shao-Heng 陳邵亨
spellingShingle	Chen, Shao-Heng 陳邵亨 Chinese Grammatical Error Detection and Classification
author_sort	Chen, Shao-Heng
title	Chinese Grammatical Error Detection and Classification
title_short	Chinese Grammatical Error Detection and Classification
title_full	Chinese Grammatical Error Detection and Classification
title_fullStr	Chinese Grammatical Error Detection and Classification
title_full_unstemmed	Chinese Grammatical Error Detection and Classification
title_sort	chinese grammatical error detection and classification
publishDate	2016
url	http://ndltd.ncl.edu.tw/handle/55054375170202790828
work_keys_str_mv	AT chenshaoheng chinesegrammaticalerrordetectionandclassification AT chénshàohēng chinesegrammaticalerrordetectionandclassification AT chenshaoheng zhōngwénwénfǎcuòwùzhēncèyǔfēnlèi AT chénshàohēng zhōngwénwénfǎcuòwùzhēncèyǔfēnlèi
_version_	1718540223260917760

Chinese Grammatical Error Detection and Classification

Similar Items