Chinese Spell Checking Based on Noisy Channel Model

碩士 === 國立清華大學 === 資訊系統與應用研究所 === 102 === Chinese spell checking is an important component of many Chinese NLP applications, including word processors, search engines, and automatic essay rating. Compared to English, Chinese has no word boundaries, and there are various Chinese input methods that cau...

Full description

Bibliographic Details
Main Authors: Chiu, Hsun-Wen, 邱絢紋
Other Authors: Chang, Jason S.
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/48295259219011665221
id ndltd-TW-102NTHU5394037
record_format oai_dc
spelling ndltd-TW-102NTHU53940372017-10-25T04:36:01Z http://ndltd.ncl.edu.tw/handle/48295259219011665221 Chinese Spell Checking Based on Noisy Channel Model Chiu, Hsun-Wen 邱絢紋 碩士 國立清華大學 資訊系統與應用研究所 102 Chinese spell checking is an important component of many Chinese NLP applications, including word processors, search engines, and automatic essay rating. Compared to English, Chinese has no word boundaries, and there are various Chinese input methods that cause different kinds of typos. Therefore, it is more difficult to develop a spell checker for Chinese. In this paper, we introduce a novel method for correcting Chinese errors based on sound or shape similarity. In our approach, potential typos in a given sentence are then corrected using a channel model and a character-based language model in the noisy channel model. In the training phase, we estimate the channel probabilities for each character based on ngrams in Web corpus. At run-time, the system generates correction candidates for each character in the given sentence and selects the appropriate correction using the channel model and the language model. The experimental results show that the proposed method achieves significantly better accuracy and recall than more complicated methods in the previous work. Chang, Jason S. 張俊盛 2014 學位論文 ; thesis 41 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立清華大學 === 資訊系統與應用研究所 === 102 === Chinese spell checking is an important component of many Chinese NLP applications, including word processors, search engines, and automatic essay rating. Compared to English, Chinese has no word boundaries, and there are various Chinese input methods that cause different kinds of typos. Therefore, it is more difficult to develop a spell checker for Chinese. In this paper, we introduce a novel method for correcting Chinese errors based on sound or shape similarity. In our approach, potential typos in a given sentence are then corrected using a channel model and a character-based language model in the noisy channel model. In the training phase, we estimate the channel probabilities for each character based on ngrams in Web corpus. At run-time, the system generates correction candidates for each character in the given sentence and selects the appropriate correction using the channel model and the language model. The experimental results show that the proposed method achieves significantly better accuracy and recall than more complicated methods in the previous work.
author2 Chang, Jason S.
author_facet Chang, Jason S.
Chiu, Hsun-Wen
邱絢紋
author Chiu, Hsun-Wen
邱絢紋
spellingShingle Chiu, Hsun-Wen
邱絢紋
Chinese Spell Checking Based on Noisy Channel Model
author_sort Chiu, Hsun-Wen
title Chinese Spell Checking Based on Noisy Channel Model
title_short Chinese Spell Checking Based on Noisy Channel Model
title_full Chinese Spell Checking Based on Noisy Channel Model
title_fullStr Chinese Spell Checking Based on Noisy Channel Model
title_full_unstemmed Chinese Spell Checking Based on Noisy Channel Model
title_sort chinese spell checking based on noisy channel model
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/48295259219011665221
work_keys_str_mv AT chiuhsunwen chinesespellcheckingbasedonnoisychannelmodel
AT qiūxuànwén chinesespellcheckingbasedonnoisychannelmodel
_version_ 1718556441931939840