A Study on Example-Based Mandarin-Taiwanese Machine Translation

碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 103 === Although there are many translation systems have been developed on the internet, Taiwanese translation system is still not built maturely. In this paper we use Taiwanese pronounced drama with Mandarin subtitles as bilingual data which use personally typing-by-l...

Full description

Bibliographic Details
Main Authors: Huang, Chih-Chao, 黃志超
Other Authors: Lin, Chuan-Jie
Format: Others
Language:zh-TW
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/20975873361730308829
Description
Summary:碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 103 === Although there are many translation systems have been developed on the internet, Taiwanese translation system is still not built maturely. In this paper we use Taiwanese pronounced drama with Mandarin subtitles as bilingual data which use personally typing-by-listening, furthermore, to improve the “Taiwan Local Language Machine Translation”. In addition to accelerate the data-building efficiency, we use a program developed by NLP lab of National Taiwan Ocean University which include Taiwanese- Mandarin dictionary provided by professor Zheng Liang-Wei. We divide this proposal into three parts: Ⅰ. Mandarin -Taiwanese translation, in this part we proposed Longest Common Subsequence(LCS) ,and self-defined model to match the corresponding Taiwanese pronunciation of Mandarin subtitle. Ⅱ.Non-example word processing, in addition to use the Taiwanese- Mandarin dictionary to match the non-example word corresponding Taiwanese pronunciation, we also separated Taiwanese word into single Taiwanese character from Taiwanese oral data to generate pronunciation dictionary to solve non-example word problem. Ⅲ.Insertion &; Deletion processing, we use the words, which are Insertion or Deletion in example sentences, and use its context as features, which determine whether the new coming word is Insertion/Deletion or not by rule-based and other 2 methods of machine learning. In Mandarin-Taiwanese translation experiments, the best system accuracy in LCS achieved 68.79%, system accuracy improved to 73.31% after using our self-defined model. In addition, we also compared our systems with word-based unigram language model which is proposed by Lin et al. at 2008.Best result of Out Of example word shows 35.35% for f-measure by using Taiwanese- Mandarin dictionary , and best result of Out Of Vocabulary word shows 21.86% for f-measure by using Taiwanese Word Pronunciation Dictionary. In the part of Insertion &; Deletion, the best result of Deletion comes from CRF, It shows 39.96% f-measure, and Insertion best result comes from Rule-based TWID with 12.24% f-measure.