Reducing Phrase-based SMT Translation Tables by Phrase Pair Coverage Rate

碩士 === 國立暨南國際大學 === 資訊工程學系 === 102 === Before generating the phrase translation tables in traditional phrase-based statistical machine translation (PB SMT), word alignment will be conducted first. Heuristics are then used to find the possible phrase pairs, thus producing the Phrase Translation Table...

Full description

Bibliographic Details
Main Authors: Peng, Wei-Gang, 彭維剛
Other Authors: Chang, Jing-Shin
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/72467599609554966608
id ndltd-TW-101NCNU0392049
record_format oai_dc
spelling ndltd-TW-101NCNU03920492016-03-16T04:14:50Z http://ndltd.ncl.edu.tw/handle/72467599609554966608 Reducing Phrase-based SMT Translation Tables by Phrase Pair Coverage Rate 以詞組對涵蓋率縮減詞組為本之統計式機器翻譯雙語對照表 Peng, Wei-Gang 彭維剛 碩士 國立暨南國際大學 資訊工程學系 102 Before generating the phrase translation tables in traditional phrase-based statistical machine translation (PB SMT), word alignment will be conducted first. Heuristics are then used to find the possible phrase pairs, thus producing the Phrase Translation Tables. Since the phrases and phrase pairs are induced from word alignment without phrase segmentation criteria, it is possible to produce a large number of messy phrases. In this paper, we propose a “Phrase Pair Coverage Rate” measure to help reduce the Phrase Translation Table. We firstly use an EM algorithm to find the best phrase segmentation of the source and target sentences, later in the Translation Model training (TM training) step of Moses, the phrase segmentation information is used, based on the “Phrase Pair Coverage Rate” to reduce the size of the Phrase Translation Table. The reduced Phrase Translation Table is then used with the Language Model (LM) to decode (i.e., to translate) the source sentences. Finally, the BLEU score is estimated to evaluate the translation quality, and comparison is made against the Moses SMT system. The experimental results show that phrase table size can be reduced by about 60~70%, and the BLEU score is close to Moses performance. Chang, Jing-Shin 張景新 2014 學位論文 ; thesis 33 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立暨南國際大學 === 資訊工程學系 === 102 === Before generating the phrase translation tables in traditional phrase-based statistical machine translation (PB SMT), word alignment will be conducted first. Heuristics are then used to find the possible phrase pairs, thus producing the Phrase Translation Tables. Since the phrases and phrase pairs are induced from word alignment without phrase segmentation criteria, it is possible to produce a large number of messy phrases. In this paper, we propose a “Phrase Pair Coverage Rate” measure to help reduce the Phrase Translation Table. We firstly use an EM algorithm to find the best phrase segmentation of the source and target sentences, later in the Translation Model training (TM training) step of Moses, the phrase segmentation information is used, based on the “Phrase Pair Coverage Rate” to reduce the size of the Phrase Translation Table. The reduced Phrase Translation Table is then used with the Language Model (LM) to decode (i.e., to translate) the source sentences. Finally, the BLEU score is estimated to evaluate the translation quality, and comparison is made against the Moses SMT system. The experimental results show that phrase table size can be reduced by about 60~70%, and the BLEU score is close to Moses performance.
author2 Chang, Jing-Shin
author_facet Chang, Jing-Shin
Peng, Wei-Gang
彭維剛
author Peng, Wei-Gang
彭維剛
spellingShingle Peng, Wei-Gang
彭維剛
Reducing Phrase-based SMT Translation Tables by Phrase Pair Coverage Rate
author_sort Peng, Wei-Gang
title Reducing Phrase-based SMT Translation Tables by Phrase Pair Coverage Rate
title_short Reducing Phrase-based SMT Translation Tables by Phrase Pair Coverage Rate
title_full Reducing Phrase-based SMT Translation Tables by Phrase Pair Coverage Rate
title_fullStr Reducing Phrase-based SMT Translation Tables by Phrase Pair Coverage Rate
title_full_unstemmed Reducing Phrase-based SMT Translation Tables by Phrase Pair Coverage Rate
title_sort reducing phrase-based smt translation tables by phrase pair coverage rate
publishDate 2014
url http://ndltd.ncl.edu.tw/handle/72467599609554966608
work_keys_str_mv AT pengweigang reducingphrasebasedsmttranslationtablesbyphrasepaircoveragerate
AT péngwéigāng reducingphrasebasedsmttranslationtablesbyphrasepaircoveragerate
AT pengweigang yǐcízǔduìhángàilǜsuōjiǎncízǔwèiběnzhītǒngjìshìjīqìfānyìshuāngyǔduìzhàobiǎo
AT péngwéigāng yǐcízǔduìhángàilǜsuōjiǎncízǔwèiběnzhītǒngjìshìjīqìfānyìshuāngyǔduìzhàobiǎo
_version_ 1718205919654838272