An Initial Study on Hakka Speech Synthesis

碩士 === 國立臺灣科技大學 === 資訊工程系 === 90 === In this thesis, speech synthesis for the sub-dialect of Hakka, Hai-Lu, is studied. Since Hai-Lu Hakka has 7 lexical tones and 736 base syllables, it is more complex than other Hakka sub-dialects. Signal waveform synthesis and prosody parameter generation are the...

Full description

Bibliographic Details
Main Authors: Shie-Jen Lee, 李雪貞
Other Authors: Hung-yan Gu
Format: Others
Language:zh-TW
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/45818970175515086055
id ndltd-TW-090NTUST392001
record_format oai_dc
spelling ndltd-TW-090NTUST3920012015-10-13T14:41:23Z http://ndltd.ncl.edu.tw/handle/45818970175515086055 An Initial Study on Hakka Speech Synthesis 客語語音合成之初步研究 Shie-Jen Lee 李雪貞 碩士 國立臺灣科技大學 資訊工程系 90 In this thesis, speech synthesis for the sub-dialect of Hakka, Hai-Lu, is studied. Since Hai-Lu Hakka has 7 lexical tones and 736 base syllables, it is more complex than other Hakka sub-dialects. Signal waveform synthesis and prosody parameter generation are the focus of this thesis. For signal waveform synthesis, the method, Time-Proportioned Interpolation of Pitch waveform (TIPW), is adopted and modified to solve the synthesis problem that some Hakka syllables are pronounced with abrupt-tone. Abrupt-tone is not found in Mandarin. For prosody parameter generation, we suppose that the sentence pitch-contour hidden Markov model trained for Mandarin speech synthesis could be directly borrowed to generate pitch-contour parameters for a Hakka sentence if an appropriate tone mapping rule is used. Therefore, we had studied this problem and proposed a Hakka-to-Mandarin mapping rule. According to the perception experiments conducted, the evaluation results show that our supposition is workable. About the parameters of syllable duration and amplitude, generation rules proposed by previous studies are adopted and modified here to have better prosody expression. Using the strategies mentioned above, we had built a prototype system for Hakka speech synthesis. Some simple processing functions for text-analysis are also implemented. With this system, two sets of sentences are synthesized for evaluation experiments. One set of Hakka sentences are synthesized earlier than the other set of Min-Nan sentences when the generation rules for prosodic parameters are not settled. Due to this timing difference and using of a notebook computer’s poor speaker for playing, the comprehension and naturalness scores of Hakka synthetic speech only reach 91.87% and 79.5, respectively. But for the set of Min-Nan synthetic speech, the scores obtained are 97.1% and 85.5 for comprehension and naturalness, respectively. Hung-yan Gu 古鴻炎 2002 學位論文 ; thesis 55 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 90 === In this thesis, speech synthesis for the sub-dialect of Hakka, Hai-Lu, is studied. Since Hai-Lu Hakka has 7 lexical tones and 736 base syllables, it is more complex than other Hakka sub-dialects. Signal waveform synthesis and prosody parameter generation are the focus of this thesis. For signal waveform synthesis, the method, Time-Proportioned Interpolation of Pitch waveform (TIPW), is adopted and modified to solve the synthesis problem that some Hakka syllables are pronounced with abrupt-tone. Abrupt-tone is not found in Mandarin. For prosody parameter generation, we suppose that the sentence pitch-contour hidden Markov model trained for Mandarin speech synthesis could be directly borrowed to generate pitch-contour parameters for a Hakka sentence if an appropriate tone mapping rule is used. Therefore, we had studied this problem and proposed a Hakka-to-Mandarin mapping rule. According to the perception experiments conducted, the evaluation results show that our supposition is workable. About the parameters of syllable duration and amplitude, generation rules proposed by previous studies are adopted and modified here to have better prosody expression. Using the strategies mentioned above, we had built a prototype system for Hakka speech synthesis. Some simple processing functions for text-analysis are also implemented. With this system, two sets of sentences are synthesized for evaluation experiments. One set of Hakka sentences are synthesized earlier than the other set of Min-Nan sentences when the generation rules for prosodic parameters are not settled. Due to this timing difference and using of a notebook computer’s poor speaker for playing, the comprehension and naturalness scores of Hakka synthetic speech only reach 91.87% and 79.5, respectively. But for the set of Min-Nan synthetic speech, the scores obtained are 97.1% and 85.5 for comprehension and naturalness, respectively.
author2 Hung-yan Gu
author_facet Hung-yan Gu
Shie-Jen Lee
李雪貞
author Shie-Jen Lee
李雪貞
spellingShingle Shie-Jen Lee
李雪貞
An Initial Study on Hakka Speech Synthesis
author_sort Shie-Jen Lee
title An Initial Study on Hakka Speech Synthesis
title_short An Initial Study on Hakka Speech Synthesis
title_full An Initial Study on Hakka Speech Synthesis
title_fullStr An Initial Study on Hakka Speech Synthesis
title_full_unstemmed An Initial Study on Hakka Speech Synthesis
title_sort initial study on hakka speech synthesis
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/45818970175515086055
work_keys_str_mv AT shiejenlee aninitialstudyonhakkaspeechsynthesis
AT lǐxuězhēn aninitialstudyonhakkaspeechsynthesis
AT shiejenlee kèyǔyǔyīnhéchéngzhīchūbùyánjiū
AT lǐxuězhēn kèyǔyǔyīnhéchéngzhīchūbùyánjiū
AT shiejenlee initialstudyonhakkaspeechsynthesis
AT lǐxuězhēn initialstudyonhakkaspeechsynthesis
_version_ 1717756274898108416