Phrase-Based Phrase Alignment Models

碩士 === 國立暨南國際大學 === 資訊工程學系 === 101 === The phrase translation table is the core model component of the state-of-the-art phrase-based statistical machine translation (SMT) systems. Most phrases are induced from word alignment results by using some heuristics to find phrase pairs that are “consistent”...

Full description

Bibliographic Details
Main Authors:	Yao-Cheng Hsiao, 蕭燿晟
Other Authors:	Jing-Shin Chang
Format:	Others
Language:	zh-TW
Published:	2013
Online Access:	http://ndltd.ncl.edu.tw/handle/35700651078170294870

id	ndltd-TW-100NCNU0392025
record_format	oai_dc
spelling	ndltd-TW-100NCNU03920252016-03-21T04:27:32Z http://ndltd.ncl.edu.tw/handle/35700651078170294870 Phrase-Based Phrase Alignment Models 以詞組為單位的詞組對應模式 Yao-Cheng Hsiao 蕭燿晟碩士國立暨南國際大學資訊工程學系 101 The phrase translation table is the core model component of the state-of-the-art phrase-based statistical machine translation (SMT) systems. Most phrases are induced from word alignment results by using some heuristics to find phrase pairs that are “consistent” with the word alignment results. The phrase translation table is thus affected by the word alignment accuracy as well as the heuristics to find consistent phrase pairs. Without an objective optimization criterion for phrase segmentation, however, a large number of consistent yet noisy phrase pairs may be generated. Furthermore, the phrases are essentially defined in terms of two languages. Such phrases might not respect the individual languages very well. Some specific phrase pairs and phrases might then be induced. Such a huge and noisy phrase translation table is likely to introduce estimation errors when estimating the phrase translation probability as well as searching (decoding) errors during the training and decoding phases. The large search space might also degrade the speed of the decoding process. To improve the performance of the current phrase-based SMT, it is thus necessary to optimize the phrase segmentation as well as phrase alignment models by jointly considering the results of word alignment and a non-heuristic model for phrase segmentation. By doing this, it might significantly improve the quality and speed of the decoding process and thus the translation fluency. In particular, an EM algorithm is proposed to conduct phrase segmentation for the source and target language corpora, respectively, independent of each other. The phrase alignment algorithm is then applied to such well-segmented phrases, with good estimates for phrase translation probabilities, which are based on the word alignment statistics. Jointly using the word alignment and phrase segmentation results quantitatively, instead of heuristically, to produce a quality phrase translation table and their translation probability is thus possible. Jing-Shin Chang 張景新 2013 學位論文 ; thesis 33 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立暨南國際大學 === 資訊工程學系 === 101 === The phrase translation table is the core model component of the state-of-the-art phrase-based statistical machine translation (SMT) systems. Most phrases are induced from word alignment results by using some heuristics to find phrase pairs that are “consistent” with the word alignment results. The phrase translation table is thus affected by the word alignment accuracy as well as the heuristics to find consistent phrase pairs. Without an objective optimization criterion for phrase segmentation, however, a large number of consistent yet noisy phrase pairs may be generated. Furthermore, the phrases are essentially defined in terms of two languages. Such phrases might not respect the individual languages very well. Some specific phrase pairs and phrases might then be induced. Such a huge and noisy phrase translation table is likely to introduce estimation errors when estimating the phrase translation probability as well as searching (decoding) errors during the training and decoding phases. The large search space might also degrade the speed of the decoding process. To improve the performance of the current phrase-based SMT, it is thus necessary to optimize the phrase segmentation as well as phrase alignment models by jointly considering the results of word alignment and a non-heuristic model for phrase segmentation. By doing this, it might significantly improve the quality and speed of the decoding process and thus the translation fluency. In particular, an EM algorithm is proposed to conduct phrase segmentation for the source and target language corpora, respectively, independent of each other. The phrase alignment algorithm is then applied to such well-segmented phrases, with good estimates for phrase translation probabilities, which are based on the word alignment statistics. Jointly using the word alignment and phrase segmentation results quantitatively, instead of heuristically, to produce a quality phrase translation table and their translation probability is thus possible.
author2	Jing-Shin Chang
author_facet	Jing-Shin Chang Yao-Cheng Hsiao 蕭燿晟
author	Yao-Cheng Hsiao 蕭燿晟
spellingShingle	Yao-Cheng Hsiao 蕭燿晟 Phrase-Based Phrase Alignment Models
author_sort	Yao-Cheng Hsiao
title	Phrase-Based Phrase Alignment Models
title_short	Phrase-Based Phrase Alignment Models
title_full	Phrase-Based Phrase Alignment Models
title_fullStr	Phrase-Based Phrase Alignment Models
title_full_unstemmed	Phrase-Based Phrase Alignment Models
title_sort	phrase-based phrase alignment models
publishDate	2013
url	http://ndltd.ncl.edu.tw/handle/35700651078170294870
work_keys_str_mv	AT yaochenghsiao phrasebasedphrasealignmentmodels AT xiāoyàochéng phrasebasedphrasealignmentmodels AT yaochenghsiao yǐcízǔwèidānwèidecízǔduìyīngmóshì AT xiāoyàochéng yǐcízǔwèidānwèidecízǔduìyīngmóshì
_version_	1718208595641761792

Phrase-Based Phrase Alignment Models

Similar Items