Summary: | 碩士 === 國立臺灣大學 === 資訊工程研究所 === 82 === Parallel texts bring much linguistic information such that they
can be used in word-sense disambiguation, extracting
translation templates, the automatic translation in noun
compounds, the building of bilingual dictionary, terminology
application, and so on. To do such kinds of applications, the
most important task is to align the bilingual texts. To align
a text means to show which parts of the first language
correspond to which parts of the second language. In this
thesis, an approach for sentence alignment in Chinese- English
corpus is presented. Previous works on aligning sentences
seldom touch on the texts in different language families, like
Chinese and English. NTU Bilingual Corpus which consists of
twenty texts is our testing corpus. Some discussions are
presented to show why and how sentence alignment across
different language families is difficult. Four language models
are proposed to do sentence alignment in this thesis. When a
sentence bead is generated, a score is assigned by different
criteria. Dynamic programming is used to obtain an alignment
which makes the sum of score to be the largest. Both precision
and recall are used to evaluate the performance of sentence
alignment. In our final language model, the recall is 0.936,
and the precision is 0.921.
|