Identifying Common Erroneous Patterns for Auto Editing
碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === This paper describes a framework to extract the effective correction rules from the sentence-aligned corpus and show a practical application: auto-editing using the found rules. The framework exploits the methodology of finding Levenshtein distance betwe...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2010
|
Online Access: | http://ndltd.ncl.edu.tw/handle/39561598991412577425 |
id |
ndltd-TW-098NTU05392085 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-098NTU053920852015-10-13T18:49:40Z http://ndltd.ncl.edu.tw/handle/39561598991412577425 Identifying Common Erroneous Patterns for Auto Editing 英語常見錯誤辨識與自動校正 An-Ta Huang 黃安達 碩士 臺灣大學 資訊工程學研究所 98 This paper describes a framework to extract the effective correction rules from the sentence-aligned corpus and show a practical application: auto-editing using the found rules. The framework exploits the methodology of finding Levenshtein distance between sentences to identify the key parts of the rules and then use the editing corpus to filter, condense and refine the rules. We produce the rule candidates of such form, A => B, where A stands for the erroneous pattern and B is the correct pattern. Besides, we focus on the generality of the rules to make the rules more general. Finally, we also employ the property of POS (Part of Speech) to make the rules general and can be applied to different sentences but similar in its POS form. Our framework is language independent, therefore can be applied to other languages easily. The evaluation of the discovered rules reveals that 67.2% of the top 1500 ranked rules are annotated as correct or mostly correct by experts. Based on the rules, we create an online auto-editing system for demo on http://mslab.csie.ntu.edu.tw/~kw/new_demo.html. Shou-De Lin 林守德 2010 學位論文 ; thesis 28 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === This paper describes a framework to extract the effective correction rules from the
sentence-aligned corpus and show a practical application: auto-editing using the found
rules. The framework exploits the methodology of finding Levenshtein distance
between sentences to identify the key parts of the rules and then use the editing corpus
to filter, condense and refine the rules. We produce the rule candidates of such form, A
=> B, where A stands for the erroneous pattern and B is the correct pattern. Besides, we
focus on the generality of the rules to make the rules more general. Finally, we also
employ the property of POS (Part of Speech) to make the rules general and can be
applied to different sentences but similar in its POS form.
Our framework is language independent, therefore can be applied to other
languages easily. The evaluation of the discovered rules reveals that 67.2% of the top
1500 ranked rules are annotated as correct or mostly correct by experts. Based on the
rules, we create an online auto-editing system for demo on
http://mslab.csie.ntu.edu.tw/~kw/new_demo.html.
|
author2 |
Shou-De Lin |
author_facet |
Shou-De Lin An-Ta Huang 黃安達 |
author |
An-Ta Huang 黃安達 |
spellingShingle |
An-Ta Huang 黃安達 Identifying Common Erroneous Patterns for Auto Editing |
author_sort |
An-Ta Huang |
title |
Identifying Common Erroneous Patterns for Auto Editing |
title_short |
Identifying Common Erroneous Patterns for Auto Editing |
title_full |
Identifying Common Erroneous Patterns for Auto Editing |
title_fullStr |
Identifying Common Erroneous Patterns for Auto Editing |
title_full_unstemmed |
Identifying Common Erroneous Patterns for Auto Editing |
title_sort |
identifying common erroneous patterns for auto editing |
publishDate |
2010 |
url |
http://ndltd.ncl.edu.tw/handle/39561598991412577425 |
work_keys_str_mv |
AT antahuang identifyingcommonerroneouspatternsforautoediting AT huángāndá identifyingcommonerroneouspatternsforautoediting AT antahuang yīngyǔchángjiàncuòwùbiànshíyǔzìdòngxiàozhèng AT huángāndá yīngyǔchángjiàncuòwùbiànshíyǔzìdòngxiàozhèng |
_version_ |
1718038342870761472 |