Summary: | 碩士 === 國立暨南國際大學 === 資訊工程學系 === 102 === Before generating the phrase translation tables in traditional phrase-based statistical machine translation (PB SMT), word alignment will be conducted first. Heuristics are then used to find the possible phrase pairs, thus producing the Phrase Translation Tables. Since the phrases and phrase pairs are induced from word alignment without phrase segmentation criteria, it is possible to produce a large number of messy phrases.
In this paper, we propose a “Phrase Pair Coverage Rate” measure to help reduce the Phrase Translation Table. We firstly use an EM algorithm to find the best phrase segmentation of the source and target sentences, later in the Translation Model training (TM training) step of Moses, the phrase segmentation information is used, based on the “Phrase Pair Coverage Rate” to reduce the size of the Phrase Translation Table. The reduced Phrase Translation Table is then used with the Language Model (LM) to decode (i.e., to translate) the source sentences. Finally, the BLEU score is estimated to evaluate the translation quality, and comparison is made against the Moses SMT system.
The experimental results show that phrase table size can be reduced by about 60~70%, and the BLEU score is close to Moses performance.
|