HMM-Based Chinese Text-To-Speech System with Support Speakers

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 100 === Nowadays people can use the speech technology to make their life better. Among the speech technology, speech synthesis is regarded as an important part recently. There are two speech synthesis techniques commonly used. One is the unit selection technique and th...

Full description

Bibliographic Details
Main Authors: Jie Li, 李杰
Other Authors: Lin-shan Lee
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/69884613145394916907
id ndltd-TW-100NTU05392073
record_format oai_dc
spelling ndltd-TW-100NTU053920732015-10-13T21:50:17Z http://ndltd.ncl.edu.tw/handle/69884613145394916907 HMM-Based Chinese Text-To-Speech System with Support Speakers 基於支援語者聲學模型之中文語音合成系統 Jie Li 李杰 碩士 國立臺灣大學 資訊工程學研究所 100 Nowadays people can use the speech technology to make their life better. Among the speech technology, speech synthesis is regarded as an important part recently. There are two speech synthesis techniques commonly used. One is the unit selection technique and the other is the HMM-based technique. In the unit selection technique, voice in the corpus is divided into small pieces, and they will be concatenated to generate the synthesized voice. With the HMM-based technique, the acoustic model will be calculated using the acoustic features, and synthesized voice will be generated based on acoustic models. In this thesis, I used the HMM-based technique to implement the Chinese Text-to-Speech (TTS) system. In this system, it extracts the spectral feature and the frequency feature and context-dependent labels to train the models. After the training stage, it analyzes the text and uses the corresponding models to generate the voice. In the acoustic model training it needs a large amount of training data to train a high quality model. It is difficult to obtain enough training data, so conventionally we exploit the average acoustic model and speaker adaptation to make training with less data possible. However training models close to the one of the target speaker is difficult for average acoustic models, so the performance of the speaker adaptation is not good. In this thesis, I proposed several methods to find out acoustically similar speakers as the support speakers of the target speaker and use their training data to train support speaker models. I conducted objective experiments and subjective experiments. The experiments showed support speaker model technique is better than average acoustic model technique, and support speaker model technique can result in better synthesis quality. Lin-shan Lee 李琳山 2012 學位論文 ; thesis 88 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 100 === Nowadays people can use the speech technology to make their life better. Among the speech technology, speech synthesis is regarded as an important part recently. There are two speech synthesis techniques commonly used. One is the unit selection technique and the other is the HMM-based technique. In the unit selection technique, voice in the corpus is divided into small pieces, and they will be concatenated to generate the synthesized voice. With the HMM-based technique, the acoustic model will be calculated using the acoustic features, and synthesized voice will be generated based on acoustic models. In this thesis, I used the HMM-based technique to implement the Chinese Text-to-Speech (TTS) system. In this system, it extracts the spectral feature and the frequency feature and context-dependent labels to train the models. After the training stage, it analyzes the text and uses the corresponding models to generate the voice. In the acoustic model training it needs a large amount of training data to train a high quality model. It is difficult to obtain enough training data, so conventionally we exploit the average acoustic model and speaker adaptation to make training with less data possible. However training models close to the one of the target speaker is difficult for average acoustic models, so the performance of the speaker adaptation is not good. In this thesis, I proposed several methods to find out acoustically similar speakers as the support speakers of the target speaker and use their training data to train support speaker models. I conducted objective experiments and subjective experiments. The experiments showed support speaker model technique is better than average acoustic model technique, and support speaker model technique can result in better synthesis quality.
author2 Lin-shan Lee
author_facet Lin-shan Lee
Jie Li
李杰
author Jie Li
李杰
spellingShingle Jie Li
李杰
HMM-Based Chinese Text-To-Speech System with Support Speakers
author_sort Jie Li
title HMM-Based Chinese Text-To-Speech System with Support Speakers
title_short HMM-Based Chinese Text-To-Speech System with Support Speakers
title_full HMM-Based Chinese Text-To-Speech System with Support Speakers
title_fullStr HMM-Based Chinese Text-To-Speech System with Support Speakers
title_full_unstemmed HMM-Based Chinese Text-To-Speech System with Support Speakers
title_sort hmm-based chinese text-to-speech system with support speakers
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/69884613145394916907
work_keys_str_mv AT jieli hmmbasedchinesetexttospeechsystemwithsupportspeakers
AT lǐjié hmmbasedchinesetexttospeechsystemwithsupportspeakers
AT jieli jīyúzhīyuányǔzhěshēngxuémóxíngzhīzhōngwényǔyīnhéchéngxìtǒng
AT lǐjié jīyúzhīyuányǔzhěshēngxuémóxíngzhīzhōngwényǔyīnhéchéngxìtǒng
_version_ 1718068876877496320