A Study on Example-Based Mandarin-Taiwanese Machine Translation
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 103 === Although there are many translation systems have been developed on the internet, Taiwanese translation system is still not built maturely. In this paper we use Taiwanese pronounced drama with Mandarin subtitles as bilingual data which use personally typing-by-l...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2015
|
Online Access: | http://ndltd.ncl.edu.tw/handle/20975873361730308829 |
id |
ndltd-TW-103NTOU5394042 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-103NTOU53940422016-11-06T04:19:41Z http://ndltd.ncl.edu.tw/handle/20975873361730308829 A Study on Example-Based Mandarin-Taiwanese Machine Translation 範例為本的國語─台語翻譯之研究 Huang, Chih-Chao 黃志超 碩士 國立臺灣海洋大學 資訊工程學系 103 Although there are many translation systems have been developed on the internet, Taiwanese translation system is still not built maturely. In this paper we use Taiwanese pronounced drama with Mandarin subtitles as bilingual data which use personally typing-by-listening, furthermore, to improve the “Taiwan Local Language Machine Translation”. In addition to accelerate the data-building efficiency, we use a program developed by NLP lab of National Taiwan Ocean University which include Taiwanese- Mandarin dictionary provided by professor Zheng Liang-Wei. We divide this proposal into three parts: Ⅰ. Mandarin -Taiwanese translation, in this part we proposed Longest Common Subsequence(LCS) ,and self-defined model to match the corresponding Taiwanese pronunciation of Mandarin subtitle. Ⅱ.Non-example word processing, in addition to use the Taiwanese- Mandarin dictionary to match the non-example word corresponding Taiwanese pronunciation, we also separated Taiwanese word into single Taiwanese character from Taiwanese oral data to generate pronunciation dictionary to solve non-example word problem. Ⅲ.Insertion &; Deletion processing, we use the words, which are Insertion or Deletion in example sentences, and use its context as features, which determine whether the new coming word is Insertion/Deletion or not by rule-based and other 2 methods of machine learning. In Mandarin-Taiwanese translation experiments, the best system accuracy in LCS achieved 68.79%, system accuracy improved to 73.31% after using our self-defined model. In addition, we also compared our systems with word-based unigram language model which is proposed by Lin et al. at 2008.Best result of Out Of example word shows 35.35% for f-measure by using Taiwanese- Mandarin dictionary , and best result of Out Of Vocabulary word shows 21.86% for f-measure by using Taiwanese Word Pronunciation Dictionary. In the part of Insertion &; Deletion, the best result of Deletion comes from CRF, It shows 39.96% f-measure, and Insertion best result comes from Rule-based TWID with 12.24% f-measure. Lin, Chuan-Jie 林川傑 2015 學位論文 ; thesis 47 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 103 === Although there are many translation systems have been developed on the internet, Taiwanese translation system is still not built maturely.
In this paper we use Taiwanese pronounced drama with Mandarin subtitles as bilingual data which use personally typing-by-listening, furthermore, to improve the “Taiwan Local Language Machine Translation”.
In addition to accelerate the data-building efficiency, we use a program developed by NLP lab of National Taiwan Ocean University which include Taiwanese- Mandarin dictionary provided by professor Zheng Liang-Wei.
We divide this proposal into three parts:
Ⅰ. Mandarin -Taiwanese translation, in this part we proposed Longest Common Subsequence(LCS) ,and self-defined model to match the corresponding Taiwanese pronunciation of Mandarin subtitle.
Ⅱ.Non-example word processing, in addition to use the Taiwanese- Mandarin dictionary to match the non-example word corresponding Taiwanese pronunciation, we also separated Taiwanese word into single Taiwanese character from Taiwanese oral data to generate pronunciation dictionary to solve non-example word problem.
Ⅲ.Insertion &; Deletion processing, we use the words, which are Insertion or Deletion in example sentences, and use its context as features, which determine whether the new coming word is Insertion/Deletion or not by rule-based and other 2 methods of machine learning.
In Mandarin-Taiwanese translation experiments, the best system accuracy in LCS achieved 68.79%, system accuracy improved to 73.31% after using our self-defined model. In addition, we also compared our systems with word-based unigram language model which is proposed by Lin et al. at 2008.Best result of Out Of example word shows 35.35% for f-measure by using Taiwanese- Mandarin dictionary , and best result of Out Of Vocabulary word shows 21.86% for f-measure by using Taiwanese Word Pronunciation Dictionary.
In the part of Insertion &; Deletion, the best result of Deletion comes from CRF, It shows 39.96% f-measure, and Insertion best result comes from Rule-based TWID with 12.24% f-measure.
|
author2 |
Lin, Chuan-Jie |
author_facet |
Lin, Chuan-Jie Huang, Chih-Chao 黃志超 |
author |
Huang, Chih-Chao 黃志超 |
spellingShingle |
Huang, Chih-Chao 黃志超 A Study on Example-Based Mandarin-Taiwanese Machine Translation |
author_sort |
Huang, Chih-Chao |
title |
A Study on Example-Based Mandarin-Taiwanese Machine Translation |
title_short |
A Study on Example-Based Mandarin-Taiwanese Machine Translation |
title_full |
A Study on Example-Based Mandarin-Taiwanese Machine Translation |
title_fullStr |
A Study on Example-Based Mandarin-Taiwanese Machine Translation |
title_full_unstemmed |
A Study on Example-Based Mandarin-Taiwanese Machine Translation |
title_sort |
study on example-based mandarin-taiwanese machine translation |
publishDate |
2015 |
url |
http://ndltd.ncl.edu.tw/handle/20975873361730308829 |
work_keys_str_mv |
AT huangchihchao astudyonexamplebasedmandarintaiwanesemachinetranslation AT huángzhìchāo astudyonexamplebasedmandarintaiwanesemachinetranslation AT huangchihchao fànlìwèiběndeguóyǔtáiyǔfānyìzhīyánjiū AT huángzhìchāo fànlìwèiběndeguóyǔtáiyǔfānyìzhīyánjiū AT huangchihchao studyonexamplebasedmandarintaiwanesemachinetranslation AT huángzhìchāo studyonexamplebasedmandarintaiwanesemachinetranslation |
_version_ |
1718391435789598720 |