A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
碩士 === 國立成功大學 === 電機工程學系碩博士班 === 101 === In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic feature...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2013
|
Online Access: | http://ndltd.ncl.edu.tw/handle/85239144130615577973 |
id |
ndltd-TW-101NCKU5442185 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-101NCKU54421852015-10-13T22:51:44Z http://ndltd.ncl.edu.tw/handle/85239144130615577973 A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech 使用語言與聲學資訊之高斯混合模型語音轉換應用於可自訂文字轉語音系統 Yu-WeiBai 白育瑋 碩士 國立成功大學 電機工程學系碩博士班 101 In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic features in synthesis phase: spectrum, pitch and duration in HTS. Two major features are transformed by proposed methods respectively, to synthesize target speaker’s speech. In training phase, the parallel corpora are required for CART training, and due to the corpus collection efficiency and phonetic balance, a pre-designed phonetic balanced text corpus is established and a phonetic balanced sentence selection algorithm is proposed. Then, the linguistic CART and acoustic clusters of spectrum and pitch are constructed through the proposed mechanisms respectively. In synthesis phase, according to the label sequence generated by text analyzer, the conversion functions of spectrum and pitch are determined from the linguistic CART and acoustic clusters respectively. Next, the frame-based spectrum and pitch features are generated from the parameter generation process and then converted by the linguistic and acoustic conversion functions of spectrum and pitch. A complementary effect is achieved by using linguistic and acoustic conversion. Finally, target speaker’s speech is synthesized from MLSA vocoder with those converted features. In the experiments, objective and subjective evaluation tests are designed to compare the speaker conversion results. The objective evaluation of spectrum is carried out. In subjective evaluation, three types of MOS are used to estimate the conversion results: fluency, intelligibility and voice quality, MOS scores are achieved 4.03, 4.12 and 4.09 respectively. In summary, the proposed speaker conversion system has improved the conversion performance. Jhing-Fa Wang 王駿發 2013 學位論文 ; thesis 71 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 電機工程學系碩博士班 === 101 === In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic features in synthesis phase: spectrum, pitch and duration in HTS. Two major features are transformed by proposed methods respectively, to synthesize target speaker’s speech. In training phase, the parallel corpora are required for CART training, and due to the corpus collection efficiency and phonetic balance, a pre-designed phonetic balanced text corpus is established and a phonetic balanced sentence selection algorithm is proposed. Then, the linguistic CART and acoustic clusters of spectrum and pitch are constructed through the proposed mechanisms respectively. In synthesis phase, according to the label sequence generated by text analyzer, the conversion functions of spectrum and pitch are determined from the linguistic CART and acoustic clusters respectively. Next, the frame-based spectrum and pitch features are generated from the parameter generation process and then converted by the linguistic and acoustic conversion functions of spectrum and pitch. A complementary effect is achieved by using linguistic and acoustic conversion. Finally, target speaker’s speech is synthesized from MLSA vocoder with those converted features.
In the experiments, objective and subjective evaluation tests are designed to compare the speaker conversion results. The objective evaluation of spectrum is carried out. In subjective evaluation, three types of MOS are used to estimate the conversion results: fluency, intelligibility and voice quality, MOS scores are achieved 4.03, 4.12 and 4.09 respectively. In summary, the proposed speaker conversion system has improved the conversion performance.
|
author2 |
Jhing-Fa Wang |
author_facet |
Jhing-Fa Wang Yu-WeiBai 白育瑋 |
author |
Yu-WeiBai 白育瑋 |
spellingShingle |
Yu-WeiBai 白育瑋 A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech |
author_sort |
Yu-WeiBai |
title |
A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech |
title_short |
A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech |
title_full |
A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech |
title_fullStr |
A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech |
title_full_unstemmed |
A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech |
title_sort |
gmm-based voice conversion system using linguistic and acoustic information for customizable text-to-speech |
publishDate |
2013 |
url |
http://ndltd.ncl.edu.tw/handle/85239144130615577973 |
work_keys_str_mv |
AT yuweibai agmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech AT báiyùwěi agmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech AT yuweibai shǐyòngyǔyányǔshēngxuézīxùnzhīgāosīhùnhémóxíngyǔyīnzhuǎnhuànyīngyòngyúkězìdìngwénzìzhuǎnyǔyīnxìtǒng AT báiyùwěi shǐyòngyǔyányǔshēngxuézīxùnzhīgāosīhùnhémóxíngyǔyīnzhuǎnhuànyīngyòngyúkězìdìngwénzìzhuǎnyǔyīnxìtǒng AT yuweibai gmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech AT báiyùwěi gmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech |
_version_ |
1718081412050976768 |