Extraction of Bilingual Multiword Expressions with Application to Bilingual Concordancer

博士 === 國立清華大學 === 資訊工程學系 === 101 === A bilingual concordancer is a computer-assisted translation tool that uses the parallel corpus as its knowledge base. Given a word or phrase, the bilingual concordancer retrieves aligned sentence pairs, which contain the word or phrase in the source sentences, fr...

Full description

Bibliographic Details
Main Authors: Bai, Ming-Hong, 白明弘
Other Authors: Chang, Jason S.
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/63001866540723370915
Description
Summary:博士 === 國立清華大學 === 資訊工程學系 === 101 === A bilingual concordancer is a computer-assisted translation tool that uses the parallel corpus as its knowledge base. Given a word or phrase, the bilingual concordancer retrieves aligned sentence pairs, which contain the word or phrase in the source sentences, from the parallel corpus. Then, it identifies the translation equivalents in the target sentences and reorders the sentence pairs according to the correlation from the query string and the translation equivalents. It helps not only on finding translation equivalents of the query but also presenting various contexts of occurrence. As a result, it is extremely useful for bilingual lexicographers, human translators and second language learners. Extraction of bilingual multi-word expressions is the most important part of a bilingual concordancer. For example, highlighting translation equivalents in the target sentence and generating translation equivalent list are highly depend on a high quality extraction model. However, the existing models for extracting translation equivalents still have many problems and still room to improve. In this thesis, we discuss some problems of the existing models for extracting bilingual multi-word expressions, including the over-alignment problem and the under-alignment problem. Then, we propose a novel model to address these problems to improve the quality the extracted translation equivalents. Further, we implement a bilingual concordancer employs the proposed translation extraction model. To measure the performance of the bilingual concordancer, we use three type of multi-word expression as our test target. The results are compared with the existing statistical machine translation models.