An Implementation of HMM-based English Speech Synthesis
碩士 === 國立交通大學 === 電信工程研究所 === 100 === The thesis establishes an online English text to speech system. Using the data base based on a woman whose mother language is China read TOEFL article. First through a good tri-phone model to segment data base, then using CMU dictionary and Stanford-Postagger so...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2011
|
Online Access: | http://ndltd.ncl.edu.tw/handle/96185639096441948381 |
id |
ndltd-TW-100NCTU5435004 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NCTU54350042015-10-13T20:37:27Z http://ndltd.ncl.edu.tw/handle/96185639096441948381 An Implementation of HMM-based English Speech Synthesis 基於隱藏式馬可夫模型之英文語音合成系統實作 Liu, Kuan-Yi 劉冠驛 碩士 國立交通大學 電信工程研究所 100 The thesis establishes an online English text to speech system. Using the data base based on a woman whose mother language is China read TOEFL article. First through a good tri-phone model to segment data base, then using CMU dictionary and Stanford-Postagger software labeled phone, syllable, word, phrase and sentence five level structure relative position and prosodic information, to establish vocal cave, fundamental frequency, and duration model, expected to product more prosody and rhythm. According to experiment result, the synthesized prosody still not natural enough. Although compare with speech synthesized from foreign web site, our prosody is more ripple but more blurred and weird rise and fall. Suppose to use rule based method to estimate variety prosodic labels still not accurate enough. So synthesized speech prosody right in general, but having strange ripple in detail. Chen, Sin-Horng 陳信宏 2011 學位論文 ; thesis 52 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立交通大學 === 電信工程研究所 === 100 === The thesis establishes an online English text to speech system. Using the data base based on a woman whose mother language is China read TOEFL article. First through a good tri-phone model to segment data base, then using CMU dictionary and Stanford-Postagger software labeled phone, syllable, word, phrase and sentence five level structure relative position and prosodic information, to establish vocal cave, fundamental frequency, and duration model, expected to product more prosody and rhythm.
According to experiment result, the synthesized prosody still not natural enough. Although compare with speech synthesized from foreign web site, our prosody is more ripple but more blurred and weird rise and fall. Suppose to use rule based method to estimate variety prosodic labels still not accurate enough. So synthesized speech prosody right in general, but having strange ripple in detail.
|
author2 |
Chen, Sin-Horng |
author_facet |
Chen, Sin-Horng Liu, Kuan-Yi 劉冠驛 |
author |
Liu, Kuan-Yi 劉冠驛 |
spellingShingle |
Liu, Kuan-Yi 劉冠驛 An Implementation of HMM-based English Speech Synthesis |
author_sort |
Liu, Kuan-Yi |
title |
An Implementation of HMM-based English Speech Synthesis |
title_short |
An Implementation of HMM-based English Speech Synthesis |
title_full |
An Implementation of HMM-based English Speech Synthesis |
title_fullStr |
An Implementation of HMM-based English Speech Synthesis |
title_full_unstemmed |
An Implementation of HMM-based English Speech Synthesis |
title_sort |
implementation of hmm-based english speech synthesis |
publishDate |
2011 |
url |
http://ndltd.ncl.edu.tw/handle/96185639096441948381 |
work_keys_str_mv |
AT liukuanyi animplementationofhmmbasedenglishspeechsynthesis AT liúguānyì animplementationofhmmbasedenglishspeechsynthesis AT liukuanyi jīyúyǐncángshìmǎkěfūmóxíngzhīyīngwényǔyīnhéchéngxìtǒngshízuò AT liúguānyì jīyúyǐncángshìmǎkěfūmóxíngzhīyīngwényǔyīnhéchéngxìtǒngshízuò AT liukuanyi implementationofhmmbasedenglishspeechsynthesis AT liúguānyì implementationofhmmbasedenglishspeechsynthesis |
_version_ |
1718050003677609984 |