Identifying Common Erroneous Patterns for Auto Editing

碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === This paper describes a framework to extract the effective correction rules from the sentence-aligned corpus and show a practical application: auto-editing using the found rules. The framework exploits the methodology of finding Levenshtein distance betwe...

Full description

Bibliographic Details
Main Authors: An-Ta Huang, 黃安達
Other Authors: Shou-De Lin
Format: Others
Language:en_US
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/39561598991412577425
id ndltd-TW-098NTU05392085
record_format oai_dc
spelling ndltd-TW-098NTU053920852015-10-13T18:49:40Z http://ndltd.ncl.edu.tw/handle/39561598991412577425 Identifying Common Erroneous Patterns for Auto Editing 英語常見錯誤辨識與自動校正 An-Ta Huang 黃安達 碩士 臺灣大學 資訊工程學研究所 98 This paper describes a framework to extract the effective correction rules from the sentence-aligned corpus and show a practical application: auto-editing using the found rules. The framework exploits the methodology of finding Levenshtein distance between sentences to identify the key parts of the rules and then use the editing corpus to filter, condense and refine the rules. We produce the rule candidates of such form, A => B, where A stands for the erroneous pattern and B is the correct pattern. Besides, we focus on the generality of the rules to make the rules more general. Finally, we also employ the property of POS (Part of Speech) to make the rules general and can be applied to different sentences but similar in its POS form. Our framework is language independent, therefore can be applied to other languages easily. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we create an online auto-editing system for demo on http://mslab.csie.ntu.edu.tw/~kw/new_demo.html. Shou-De Lin 林守德 2010 學位論文 ; thesis 28 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === This paper describes a framework to extract the effective correction rules from the sentence-aligned corpus and show a practical application: auto-editing using the found rules. The framework exploits the methodology of finding Levenshtein distance between sentences to identify the key parts of the rules and then use the editing corpus to filter, condense and refine the rules. We produce the rule candidates of such form, A => B, where A stands for the erroneous pattern and B is the correct pattern. Besides, we focus on the generality of the rules to make the rules more general. Finally, we also employ the property of POS (Part of Speech) to make the rules general and can be applied to different sentences but similar in its POS form. Our framework is language independent, therefore can be applied to other languages easily. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we create an online auto-editing system for demo on http://mslab.csie.ntu.edu.tw/~kw/new_demo.html.
author2 Shou-De Lin
author_facet Shou-De Lin
An-Ta Huang
黃安達
author An-Ta Huang
黃安達
spellingShingle An-Ta Huang
黃安達
Identifying Common Erroneous Patterns for Auto Editing
author_sort An-Ta Huang
title Identifying Common Erroneous Patterns for Auto Editing
title_short Identifying Common Erroneous Patterns for Auto Editing
title_full Identifying Common Erroneous Patterns for Auto Editing
title_fullStr Identifying Common Erroneous Patterns for Auto Editing
title_full_unstemmed Identifying Common Erroneous Patterns for Auto Editing
title_sort identifying common erroneous patterns for auto editing
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/39561598991412577425
work_keys_str_mv AT antahuang identifyingcommonerroneouspatternsforautoediting
AT huángāndá identifyingcommonerroneouspatternsforautoediting
AT antahuang yīngyǔchángjiàncuòwùbiànshíyǔzìdòngxiàozhèng
AT huángāndá yīngyǔchángjiàncuòwùbiànshíyǔzìdòngxiàozhèng
_version_ 1718038342870761472