A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech

碩士 === 國立成功大學 === 電機工程學系碩博士班 === 101 === In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic feature...

Full description

Bibliographic Details
Main Authors: Yu-WeiBai, 白育瑋
Other Authors: Jhing-Fa Wang
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/85239144130615577973
id ndltd-TW-101NCKU5442185
record_format oai_dc
spelling ndltd-TW-101NCKU54421852015-10-13T22:51:44Z http://ndltd.ncl.edu.tw/handle/85239144130615577973 A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech 使用語言與聲學資訊之高斯混合模型語音轉換應用於可自訂文字轉語音系統 Yu-WeiBai 白育瑋 碩士 國立成功大學 電機工程學系碩博士班 101 In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic features in synthesis phase: spectrum, pitch and duration in HTS. Two major features are transformed by proposed methods respectively, to synthesize target speaker’s speech. In training phase, the parallel corpora are required for CART training, and due to the corpus collection efficiency and phonetic balance, a pre-designed phonetic balanced text corpus is established and a phonetic balanced sentence selection algorithm is proposed. Then, the linguistic CART and acoustic clusters of spectrum and pitch are constructed through the proposed mechanisms respectively. In synthesis phase, according to the label sequence generated by text analyzer, the conversion functions of spectrum and pitch are determined from the linguistic CART and acoustic clusters respectively. Next, the frame-based spectrum and pitch features are generated from the parameter generation process and then converted by the linguistic and acoustic conversion functions of spectrum and pitch. A complementary effect is achieved by using linguistic and acoustic conversion. Finally, target speaker’s speech is synthesized from MLSA vocoder with those converted features. In the experiments, objective and subjective evaluation tests are designed to compare the speaker conversion results. The objective evaluation of spectrum is carried out. In subjective evaluation, three types of MOS are used to estimate the conversion results: fluency, intelligibility and voice quality, MOS scores are achieved 4.03, 4.12 and 4.09 respectively. In summary, the proposed speaker conversion system has improved the conversion performance. Jhing-Fa Wang 王駿發 2013 學位論文 ; thesis 71 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立成功大學 === 電機工程學系碩博士班 === 101 === In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic features in synthesis phase: spectrum, pitch and duration in HTS. Two major features are transformed by proposed methods respectively, to synthesize target speaker’s speech. In training phase, the parallel corpora are required for CART training, and due to the corpus collection efficiency and phonetic balance, a pre-designed phonetic balanced text corpus is established and a phonetic balanced sentence selection algorithm is proposed. Then, the linguistic CART and acoustic clusters of spectrum and pitch are constructed through the proposed mechanisms respectively. In synthesis phase, according to the label sequence generated by text analyzer, the conversion functions of spectrum and pitch are determined from the linguistic CART and acoustic clusters respectively. Next, the frame-based spectrum and pitch features are generated from the parameter generation process and then converted by the linguistic and acoustic conversion functions of spectrum and pitch. A complementary effect is achieved by using linguistic and acoustic conversion. Finally, target speaker’s speech is synthesized from MLSA vocoder with those converted features. In the experiments, objective and subjective evaluation tests are designed to compare the speaker conversion results. The objective evaluation of spectrum is carried out. In subjective evaluation, three types of MOS are used to estimate the conversion results: fluency, intelligibility and voice quality, MOS scores are achieved 4.03, 4.12 and 4.09 respectively. In summary, the proposed speaker conversion system has improved the conversion performance.
author2 Jhing-Fa Wang
author_facet Jhing-Fa Wang
Yu-WeiBai
白育瑋
author Yu-WeiBai
白育瑋
spellingShingle Yu-WeiBai
白育瑋
A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
author_sort Yu-WeiBai
title A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_short A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_full A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_fullStr A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_full_unstemmed A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech
title_sort gmm-based voice conversion system using linguistic and acoustic information for customizable text-to-speech
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/85239144130615577973
work_keys_str_mv AT yuweibai agmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech
AT báiyùwěi agmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech
AT yuweibai shǐyòngyǔyányǔshēngxuézīxùnzhīgāosīhùnhémóxíngyǔyīnzhuǎnhuànyīngyòngyúkězìdìngwénzìzhuǎnyǔyīnxìtǒng
AT báiyùwěi shǐyòngyǔyányǔshēngxuézīxùnzhīgāosīhùnhémóxíngyǔyīnzhuǎnhuànyīngyòngyúkězìdìngwénzìzhuǎnyǔyīnxìtǒng
AT yuweibai gmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech
AT báiyùwěi gmmbasedvoiceconversionsystemusinglinguisticandacousticinformationforcustomizabletexttospeech
_version_ 1718081412050976768