Phrase-Based Phrase Alignment Models

碩士 === 國立暨南國際大學 === 資訊工程學系 === 101 === The phrase translation table is the core model component of the state-of-the-art phrase-based statistical machine translation (SMT) systems. Most phrases are induced from word alignment results by using some heuristics to find phrase pairs that are “consistent”...

Full description

Bibliographic Details
Main Authors: Yao-Cheng Hsiao, 蕭燿晟
Other Authors: Jing-Shin Chang
Format: Others
Language:zh-TW
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/35700651078170294870
id ndltd-TW-100NCNU0392025
record_format oai_dc
spelling ndltd-TW-100NCNU03920252016-03-21T04:27:32Z http://ndltd.ncl.edu.tw/handle/35700651078170294870 Phrase-Based Phrase Alignment Models 以詞組為單位的詞組對應模式 Yao-Cheng Hsiao 蕭燿晟 碩士 國立暨南國際大學 資訊工程學系 101 The phrase translation table is the core model component of the state-of-the-art phrase-based statistical machine translation (SMT) systems. Most phrases are induced from word alignment results by using some heuristics to find phrase pairs that are “consistent” with the word alignment results. The phrase translation table is thus affected by the word alignment accuracy as well as the heuristics to find consistent phrase pairs. Without an objective optimization criterion for phrase segmentation, however, a large number of consistent yet noisy phrase pairs may be generated. Furthermore, the phrases are essentially defined in terms of two languages. Such phrases might not respect the individual languages very well. Some specific phrase pairs and phrases might then be induced. Such a huge and noisy phrase translation table is likely to introduce estimation errors when estimating the phrase translation probability as well as searching (decoding) errors during the training and decoding phases. The large search space might also degrade the speed of the decoding process. To improve the performance of the current phrase-based SMT, it is thus necessary to optimize the phrase segmentation as well as phrase alignment models by jointly considering the results of word alignment and a non-heuristic model for phrase segmentation. By doing this, it might significantly improve the quality and speed of the decoding process and thus the translation fluency. In particular, an EM algorithm is proposed to conduct phrase segmentation for the source and target language corpora, respectively, independent of each other. The phrase alignment algorithm is then applied to such well-segmented phrases, with good estimates for phrase translation probabilities, which are based on the word alignment statistics. Jointly using the word alignment and phrase segmentation results quantitatively, instead of heuristically, to produce a quality phrase translation table and their translation probability is thus possible. Jing-Shin Chang 張景新 2013 學位論文 ; thesis 33 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立暨南國際大學 === 資訊工程學系 === 101 === The phrase translation table is the core model component of the state-of-the-art phrase-based statistical machine translation (SMT) systems. Most phrases are induced from word alignment results by using some heuristics to find phrase pairs that are “consistent” with the word alignment results. The phrase translation table is thus affected by the word alignment accuracy as well as the heuristics to find consistent phrase pairs. Without an objective optimization criterion for phrase segmentation, however, a large number of consistent yet noisy phrase pairs may be generated. Furthermore, the phrases are essentially defined in terms of two languages. Such phrases might not respect the individual languages very well. Some specific phrase pairs and phrases might then be induced. Such a huge and noisy phrase translation table is likely to introduce estimation errors when estimating the phrase translation probability as well as searching (decoding) errors during the training and decoding phases. The large search space might also degrade the speed of the decoding process. To improve the performance of the current phrase-based SMT, it is thus necessary to optimize the phrase segmentation as well as phrase alignment models by jointly considering the results of word alignment and a non-heuristic model for phrase segmentation. By doing this, it might significantly improve the quality and speed of the decoding process and thus the translation fluency. In particular, an EM algorithm is proposed to conduct phrase segmentation for the source and target language corpora, respectively, independent of each other. The phrase alignment algorithm is then applied to such well-segmented phrases, with good estimates for phrase translation probabilities, which are based on the word alignment statistics. Jointly using the word alignment and phrase segmentation results quantitatively, instead of heuristically, to produce a quality phrase translation table and their translation probability is thus possible.
author2 Jing-Shin Chang
author_facet Jing-Shin Chang
Yao-Cheng Hsiao
蕭燿晟
author Yao-Cheng Hsiao
蕭燿晟
spellingShingle Yao-Cheng Hsiao
蕭燿晟
Phrase-Based Phrase Alignment Models
author_sort Yao-Cheng Hsiao
title Phrase-Based Phrase Alignment Models
title_short Phrase-Based Phrase Alignment Models
title_full Phrase-Based Phrase Alignment Models
title_fullStr Phrase-Based Phrase Alignment Models
title_full_unstemmed Phrase-Based Phrase Alignment Models
title_sort phrase-based phrase alignment models
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/35700651078170294870
work_keys_str_mv AT yaochenghsiao phrasebasedphrasealignmentmodels
AT xiāoyàochéng phrasebasedphrasealignmentmodels
AT yaochenghsiao yǐcízǔwèidānwèidecízǔduìyīngmóshì
AT xiāoyàochéng yǐcízǔwèidānwèidecízǔduìyīngmóshì
_version_ 1718208595641761792