A Study on Example-Based Mandarin-Taiwanese Machine Translation

碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 103 === Although there are many translation systems have been developed on the internet, Taiwanese translation system is still not built maturely. In this paper we use Taiwanese pronounced drama with Mandarin subtitles as bilingual data which use personally typing-by-l...

Full description

Bibliographic Details
Main Authors: Huang, Chih-Chao, 黃志超
Other Authors: Lin, Chuan-Jie
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/20975873361730308829
id ndltd-TW-103NTOU5394042
record_format oai_dc
spelling ndltd-TW-103NTOU53940422016-11-06T04:19:41Z http://ndltd.ncl.edu.tw/handle/20975873361730308829 A Study on Example-Based Mandarin-Taiwanese Machine Translation 範例為本的國語─台語翻譯之研究 Huang, Chih-Chao 黃志超 碩士 國立臺灣海洋大學 資訊工程學系 103 Although there are many translation systems have been developed on the internet, Taiwanese translation system is still not built maturely. In this paper we use Taiwanese pronounced drama with Mandarin subtitles as bilingual data which use personally typing-by-listening, furthermore, to improve the “Taiwan Local Language Machine Translation”. In addition to accelerate the data-building efficiency, we use a program developed by NLP lab of National Taiwan Ocean University which include Taiwanese- Mandarin dictionary provided by professor Zheng Liang-Wei. We divide this proposal into three parts: Ⅰ. Mandarin -Taiwanese translation, in this part we proposed Longest Common Subsequence(LCS) ,and self-defined model to match the corresponding Taiwanese pronunciation of Mandarin subtitle. Ⅱ.Non-example word processing, in addition to use the Taiwanese- Mandarin dictionary to match the non-example word corresponding Taiwanese pronunciation, we also separated Taiwanese word into single Taiwanese character from Taiwanese oral data to generate pronunciation dictionary to solve non-example word problem. Ⅲ.Insertion &; Deletion processing, we use the words, which are Insertion or Deletion in example sentences, and use its context as features, which determine whether the new coming word is Insertion/Deletion or not by rule-based and other 2 methods of machine learning. In Mandarin-Taiwanese translation experiments, the best system accuracy in LCS achieved 68.79%, system accuracy improved to 73.31% after using our self-defined model. In addition, we also compared our systems with word-based unigram language model which is proposed by Lin et al. at 2008.Best result of Out Of example word shows 35.35% for f-measure by using Taiwanese- Mandarin dictionary , and best result of Out Of Vocabulary word shows 21.86% for f-measure by using Taiwanese Word Pronunciation Dictionary. In the part of Insertion &; Deletion, the best result of Deletion comes from CRF, It shows 39.96% f-measure, and Insertion best result comes from Rule-based TWID with 12.24% f-measure. Lin, Chuan-Jie 林川傑 2015 學位論文 ; thesis 47 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 103 === Although there are many translation systems have been developed on the internet, Taiwanese translation system is still not built maturely. In this paper we use Taiwanese pronounced drama with Mandarin subtitles as bilingual data which use personally typing-by-listening, furthermore, to improve the “Taiwan Local Language Machine Translation”. In addition to accelerate the data-building efficiency, we use a program developed by NLP lab of National Taiwan Ocean University which include Taiwanese- Mandarin dictionary provided by professor Zheng Liang-Wei. We divide this proposal into three parts: Ⅰ. Mandarin -Taiwanese translation, in this part we proposed Longest Common Subsequence(LCS) ,and self-defined model to match the corresponding Taiwanese pronunciation of Mandarin subtitle. Ⅱ.Non-example word processing, in addition to use the Taiwanese- Mandarin dictionary to match the non-example word corresponding Taiwanese pronunciation, we also separated Taiwanese word into single Taiwanese character from Taiwanese oral data to generate pronunciation dictionary to solve non-example word problem. Ⅲ.Insertion &; Deletion processing, we use the words, which are Insertion or Deletion in example sentences, and use its context as features, which determine whether the new coming word is Insertion/Deletion or not by rule-based and other 2 methods of machine learning. In Mandarin-Taiwanese translation experiments, the best system accuracy in LCS achieved 68.79%, system accuracy improved to 73.31% after using our self-defined model. In addition, we also compared our systems with word-based unigram language model which is proposed by Lin et al. at 2008.Best result of Out Of example word shows 35.35% for f-measure by using Taiwanese- Mandarin dictionary , and best result of Out Of Vocabulary word shows 21.86% for f-measure by using Taiwanese Word Pronunciation Dictionary. In the part of Insertion &; Deletion, the best result of Deletion comes from CRF, It shows 39.96% f-measure, and Insertion best result comes from Rule-based TWID with 12.24% f-measure.
author2 Lin, Chuan-Jie
author_facet Lin, Chuan-Jie
Huang, Chih-Chao
黃志超
author Huang, Chih-Chao
黃志超
spellingShingle Huang, Chih-Chao
黃志超
A Study on Example-Based Mandarin-Taiwanese Machine Translation
author_sort Huang, Chih-Chao
title A Study on Example-Based Mandarin-Taiwanese Machine Translation
title_short A Study on Example-Based Mandarin-Taiwanese Machine Translation
title_full A Study on Example-Based Mandarin-Taiwanese Machine Translation
title_fullStr A Study on Example-Based Mandarin-Taiwanese Machine Translation
title_full_unstemmed A Study on Example-Based Mandarin-Taiwanese Machine Translation
title_sort study on example-based mandarin-taiwanese machine translation
publishDate 2015
url http://ndltd.ncl.edu.tw/handle/20975873361730308829
work_keys_str_mv AT huangchihchao astudyonexamplebasedmandarintaiwanesemachinetranslation
AT huángzhìchāo astudyonexamplebasedmandarintaiwanesemachinetranslation
AT huangchihchao fànlìwèiběndeguóyǔtáiyǔfānyìzhīyánjiū
AT huángzhìchāo fànlìwèiběndeguóyǔtáiyǔfānyìzhīyánjiū
AT huangchihchao studyonexamplebasedmandarintaiwanesemachinetranslation
AT huángzhìchāo studyonexamplebasedmandarintaiwanesemachinetranslation
_version_ 1718391435789598720