Summary: | 碩士 === 華梵大學 === 資訊管理學系碩士班 === 97 === In this paper, we proposed to use the term extraction tool to extract the multi-word patterns before the word alignment processing in the statistical machine translation system. The identified pattern was used as a single word for alignment and translation. We designed an English-Japanese machine translation system, which used this term extraction technology, word alignment, part of speech tagging, translation probability, and different translation models to evaluate the performances.
The bilingual corpus of the NTCIR-7 Patent Translation Task is used for our experiments. In training stage, 100,000 aligned sentences are selected from the parallel corpus. The common patterns with length from two to six are extracted to process as the words. We select another 1,380 sentences for testing and evaluation.
The performances of the NIST and BLEU evaluations have shown that the N-Gram Precisions of BLEU and NIST using term extraction technology are better than the method without term extraction.
|