Chinese Grammatical Error Detection and Classification
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2016
|
Online Access: | http://ndltd.ncl.edu.tw/handle/55054375170202790828 |
id |
ndltd-TW-104NTOU5394044 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-104NTOU53940442017-09-24T04:40:47Z http://ndltd.ncl.edu.tw/handle/55054375170202790828 Chinese Grammatical Error Detection and Classification 中文文法錯誤偵測與分類 Chen, Shao-Heng 陳邵亨 碩士 國立臺灣海洋大學 資訊工程學系 104 This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination were proposed. The experimental data were provided by NLP-TEA2 CGED Task, which contain sentences with their error types, error positions, and their corrections. There is only one error in one sentence. Candidates of correct sentences are generated according to different error types by different methods. In the original sentence given by the user, substrings with length of 1 ~ 2 characters are removed to generate Redundant-type candidates; frequent missing words are inserted to generate Missing-type candidates; sequences of words are moved to another position to generate Disorder-type candidates; synonyms and function words in the same classes are replaced to generate Selection-type candidates. By these methods, 100% correct sentences in Redundant type, 64.67% in Missing type, 88.88% in Disorder type, and 31.33% in Selection type can be generated. Several functions to estimate the sentence generation score were also proposed. Frequencies of words and substrings were used provided in many different linguistic resources. Different weightings and normalizations have been tested to find the best function. The experimental results show that using frequencies of substrings provided by Google N-gram dataset achieves the best performance, where the F-measures in error detection, error classification, and error position determination are 67.15%, 16.81%, and 10.34%. Lin, Chuan-Jie 林川傑 2016 學位論文 ; thesis 37 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 104 === This thesis proposes a Chinese grammar checker which checks sentences written by non-native Chinese learners. Our system focuses on four major kinds of grammatical error types. Methods of error detection, error classification, and error position determination were proposed. The experimental data were provided by NLP-TEA2 CGED Task, which contain sentences with their error types, error positions, and their corrections. There is only one error in one sentence.
Candidates of correct sentences are generated according to different error types by different methods. In the original sentence given by the user, substrings with length of 1 ~ 2 characters are removed to generate Redundant-type candidates; frequent missing words are inserted to generate Missing-type candidates; sequences of words are moved to another position to generate Disorder-type candidates; synonyms and function words in the same classes are replaced to generate Selection-type candidates. By these methods, 100% correct sentences in Redundant type, 64.67% in Missing type, 88.88% in Disorder type, and 31.33% in Selection type can be generated.
Several functions to estimate the sentence generation score were also proposed. Frequencies of words and substrings were used provided in many different linguistic resources. Different weightings and normalizations have been tested to find the best function. The experimental results show that using frequencies of substrings provided by Google N-gram dataset achieves the best performance, where the F-measures in error detection, error classification, and error position determination are 67.15%, 16.81%, and 10.34%.
|
author2 |
Lin, Chuan-Jie |
author_facet |
Lin, Chuan-Jie Chen, Shao-Heng 陳邵亨 |
author |
Chen, Shao-Heng 陳邵亨 |
spellingShingle |
Chen, Shao-Heng 陳邵亨 Chinese Grammatical Error Detection and Classification |
author_sort |
Chen, Shao-Heng |
title |
Chinese Grammatical Error Detection and Classification |
title_short |
Chinese Grammatical Error Detection and Classification |
title_full |
Chinese Grammatical Error Detection and Classification |
title_fullStr |
Chinese Grammatical Error Detection and Classification |
title_full_unstemmed |
Chinese Grammatical Error Detection and Classification |
title_sort |
chinese grammatical error detection and classification |
publishDate |
2016 |
url |
http://ndltd.ncl.edu.tw/handle/55054375170202790828 |
work_keys_str_mv |
AT chenshaoheng chinesegrammaticalerrordetectionandclassification AT chénshàohēng chinesegrammaticalerrordetectionandclassification AT chenshaoheng zhōngwénwénfǎcuòwùzhēncèyǔfēnlèi AT chénshàohēng zhōngwénwénfǎcuòwùzhēncèyǔfēnlèi |
_version_ |
1718540223260917760 |