A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech

碩士 === 國立成功大學 === 電機工程學系碩博士班 === 101 === In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic feature...

Full description

Bibliographic Details
Main Authors:	Yu-WeiBai, 白育瑋
Other Authors:	Jhing-Fa Wang
Format:	Others
Language:	en_US
Published:	2013
Online Access:	http://ndltd.ncl.edu.tw/handle/85239144130615577973

id	ndltd-TW-101NCKU5442185
record_format	oai_dc
spelling	ndltd-TW-101NCKU54421852015-10-13T22:51:44Z http://ndltd.ncl.edu.tw/handle/85239144130615577973 A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech 使用語言與聲學資訊之高斯混合模型語音轉換應用於可自訂文字轉語音系統 Yu-WeiBai 白育瑋碩士國立成功大學電機工程學系碩博士班 101 In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic features in synthesis phase: spectrum, pitch and duration in HTS. Two major features are transformed by proposed methods respectively, to synthesize target speaker’s speech. In training phase, the parallel corpora are required for CART training, and due to the corpus collection efficiency and phonetic balance, a pre-designed phonetic balanced text corpus is established and a phonetic balanced sentence selection algorithm is proposed. Then, the linguistic CART and acoustic clusters of spectrum and pitch are constructed through the proposed mechanisms respectively. In synthesis phase, according to the label sequence generated by text analyzer, the conversion functions of spectrum and pitch are determined from the linguistic CART and acoustic clusters respectively. Next, the frame-based spectrum and pitch features are generated from the parameter generation process and then converted by the linguistic and acoustic conversion functions of spectrum and pitch. A complementary effect is achieved by using linguistic and acoustic conversion. Finally, target speaker’s speech is synthesized from MLSA vocoder with those converted features. In the experiments, objective and subjective evaluation tests are designed to compare the speaker conversion results. The objective evaluation of spectrum is carried out. In subjective evaluation, three types of MOS are used to estimate the conversion results: fluency, intelligibility and voice quality, MOS scores are achieved 4.03, 4.12 and 4.09 respectively. In summary, the proposed speaker conversion system has improved the conversion performance. Jhing-Fa Wang 王駿發 2013 學位論文 ; thesis 71 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立成功大學 === 電機工程學系碩博士班 === 101 === In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic features in synthesis phase: spectrum, pitch and duration in HTS. Two major features are transformed by proposed methods respectively, to synthesize target speaker’s speech. In training phase, the parallel corpora are required for CART training, and due to the corpus collection efficiency and phonetic balance, a pre-designed phonetic balanced text corpus is established and a phonetic balanced sentence selection algorithm is proposed. Then, the linguistic CART and acoustic clusters of spectrum and pitch are constructed through the proposed mechanisms respectively. In synthesis phase, according to the label sequence generated by text analyzer, the conversion functions of spectrum and pitch are determined from the linguistic CART and acoustic clusters respectively. Next, the frame-based spectrum and pitch features are generated from the parameter generation process and then converted by the linguistic and acoustic conversion functions of spectrum and pitch. A complementary effect is achieved by using linguistic and acoustic conversion. Finally, target speaker’s speech is synthesized from MLSA vocoder with those converted features. In the experiments, objective and subjective evaluation tests are designed to compare the speaker conversion results. The objective evaluation of spectrum is carried out. In subjective evaluation, three types of MOS are used to estimate the conversion results: fluency, intelligibility and voice quality, MOS scores are achieved 4.03, 4.12 and 4.09 respectively. In summary, the proposed speaker conversion system has improved the conversion performance.
author2	Jhing-Fa Wang
author_facet	Jhing-Fa Wang Yu-WeiBai 白育瑋
author	Yu-WeiBai 白育瑋
spellingShingle	Yu-WeiBai 白育瑋 A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
author_sort	Yu-WeiBai
title	A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_short	A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_full	A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_fullStr	A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_full_unstemmed	A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_sort	gmm-based voice conversion system using linguistic and acoustic information for customizable text-to-speech
publishDate	2013
url	http://ndltd.ncl.edu.tw/handle/85239144130615577973
work_keys_str_mv	AT yuweibai agmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech AT báiyùwěi agmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech AT yuweibai shǐyòngyǔyányǔshēngxuézīxùnzhīgāosīhùnhémóxíngyǔyīnzhuǎnhuànyīngyòngyúkězìdìngwénzìzhuǎnyǔyīnxìtǒng AT báiyùwěi shǐyòngyǔyányǔshēngxuézīxùnzhīgāosīhùnhémóxíngyǔyīnzhuǎnhuànyīngyòngyúkězìdìngwénzìzhuǎnyǔyīnxìtǒng AT yuweibai gmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech AT báiyùwěi gmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech
_version_	1718081412050976768

A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech

Similar Items