A Study on the Sentence Alignment Problem in Chinese-English Corpora

碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === Parallel texts bring much linguistic information such that they can be used in word-sense disambiguation, extracting translation templates, the automatic translation in noun compounds, the building of bi...

Full description

Bibliographic Details
Main Authors: Yeong-Yui Wu, 吳詠裕
Other Authors: Hsing-Hsi Chen
Format: Others
Language:zh-TW
Published: 1994
Online Access:http://ndltd.ncl.edu.tw/handle/17981541212850195479
Description
Summary:碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === Parallel texts bring much linguistic information such that they can be used in word-sense disambiguation, extracting translation templates, the automatic translation in noun compounds, the building of bilingual dictionary, terminology application, and so on. To do such kinds of applications, the most important task is to align the bilingual texts. To align a text means to show which parts of the first language correspond to which parts of the second language. In this thesis, an approach for sentence alignment in Chinese- English corpus is presented. Previous works on aligning sentences seldom touch on the texts in different language families, like Chinese and English. NTU Bilingual Corpus which consists of twenty texts is our testing corpus. Some discussions are presented to show why and how sentence alignment across different language families is difficult. Four language models are proposed to do sentence alignment in this thesis. When a sentence bead is generated, a score is assigned by different criteria. Dynamic programming is used to obtain an alignment which makes the sum of score to be the largest. Both precision and recall are used to evaluate the performance of sentence alignment. In our final language model, the recall is 0.936, and the precision is 0.921.