Summary: | 碩士 === 臺灣大學 === 資訊工程學研究所 === 98 === This paper describes a framework to extract the effective correction rules from the
sentence-aligned corpus and show a practical application: auto-editing using the found
rules. The framework exploits the methodology of finding Levenshtein distance
between sentences to identify the key parts of the rules and then use the editing corpus
to filter, condense and refine the rules. We produce the rule candidates of such form, A
=> B, where A stands for the erroneous pattern and B is the correct pattern. Besides, we
focus on the generality of the rules to make the rules more general. Finally, we also
employ the property of POS (Part of Speech) to make the rules general and can be
applied to different sentences but similar in its POS form.
Our framework is language independent, therefore can be applied to other
languages easily. The evaluation of the discovered rules reveals that 67.2% of the top
1500 ranked rules are annotated as correct or mostly correct by experts. Based on the
rules, we create an online auto-editing system for demo on
http://mslab.csie.ntu.edu.tw/~kw/new_demo.html.
|