Study and Analysis of Speech Model-Based Voice Conversion

碩士 === 國立中正大學 === 資訊工程研究所 === 107 === The target of voice conversion based on speech model-based is to transfer source voice to target voice. Through the training process, converting function parameter is generated. And then, the speech voice will just like the target voice by acoustic feature an...

Full description

Bibliographic Details
Main Authors: Chang, Wen-Han, 張文瀚
Other Authors: Lin, Tay-Jyi
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/kerybc
id ndltd-TW-107CCU00392019
record_format oai_dc
spelling ndltd-TW-107CCU003920192019-11-01T05:28:07Z http://ndltd.ncl.edu.tw/handle/kerybc Study and Analysis of Speech Model-Based Voice Conversion 基於語音模型之語音轉換探討與分析 Chang, Wen-Han 張文瀚 碩士 國立中正大學 資訊工程研究所 107 The target of voice conversion based on speech model-based is to transfer source voice to target voice. Through the training process, converting function parameter is generated. And then, the speech voice will just like the target voice by acoustic feature analysis, mapping, and synthesis in sequence. This thesis will analyze how the speech model-based method achieves voice conversion by exploring Sprocket which is the baseline system of Voice Conversion Challenge 2018. The voice conversion part of Sprocket can be divided into three stages. First, analysis of acoustic feature is used to obtain the main factors that cause the difference of each person's voice. Second, through special processing on the acoustic features, the acoustic features are converted to the target’s acoustic features. Finally, the acoustic features are used to generate the speech of the target person through synthesis. This article will focus on exploring for the main algorithms. Through these algorithms we could realize how to find out the acoustic features and important information of speech. And also we could analysis the results of different models’ execution time. After carefully studying the Sprocket architecture and using related algorithms and functions, we design experiment about analysis, conversion, and synthesis. We choose different algorithms to reduce the execution time but not cause damage to the speech quality. Based on the Python in the original Sprocket, this experiment will try to development and design in C, and then replace and optimize the algorithm and function based on C implementation. After lots of optimizations and adjustments, the experiment finally achieving 2.785 times improvement in the overall execution time of the Sprocket. We achieve the goal of speeding up the voice conversion’s execution time without significantly affecting difference in converting quality by subjective hearing. Lin, Tay-Jyi 林泰吉 2019 學位論文 ; thesis 50 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中正大學 === 資訊工程研究所 === 107 === The target of voice conversion based on speech model-based is to transfer source voice to target voice. Through the training process, converting function parameter is generated. And then, the speech voice will just like the target voice by acoustic feature analysis, mapping, and synthesis in sequence. This thesis will analyze how the speech model-based method achieves voice conversion by exploring Sprocket which is the baseline system of Voice Conversion Challenge 2018. The voice conversion part of Sprocket can be divided into three stages. First, analysis of acoustic feature is used to obtain the main factors that cause the difference of each person's voice. Second, through special processing on the acoustic features, the acoustic features are converted to the target’s acoustic features. Finally, the acoustic features are used to generate the speech of the target person through synthesis. This article will focus on exploring for the main algorithms. Through these algorithms we could realize how to find out the acoustic features and important information of speech. And also we could analysis the results of different models’ execution time. After carefully studying the Sprocket architecture and using related algorithms and functions, we design experiment about analysis, conversion, and synthesis. We choose different algorithms to reduce the execution time but not cause damage to the speech quality. Based on the Python in the original Sprocket, this experiment will try to development and design in C, and then replace and optimize the algorithm and function based on C implementation. After lots of optimizations and adjustments, the experiment finally achieving 2.785 times improvement in the overall execution time of the Sprocket. We achieve the goal of speeding up the voice conversion’s execution time without significantly affecting difference in converting quality by subjective hearing.
author2 Lin, Tay-Jyi
author_facet Lin, Tay-Jyi
Chang, Wen-Han
張文瀚
author Chang, Wen-Han
張文瀚
spellingShingle Chang, Wen-Han
張文瀚
Study and Analysis of Speech Model-Based Voice Conversion
author_sort Chang, Wen-Han
title Study and Analysis of Speech Model-Based Voice Conversion
title_short Study and Analysis of Speech Model-Based Voice Conversion
title_full Study and Analysis of Speech Model-Based Voice Conversion
title_fullStr Study and Analysis of Speech Model-Based Voice Conversion
title_full_unstemmed Study and Analysis of Speech Model-Based Voice Conversion
title_sort study and analysis of speech model-based voice conversion
publishDate 2019
url http://ndltd.ncl.edu.tw/handle/kerybc
work_keys_str_mv AT changwenhan studyandanalysisofspeechmodelbasedvoiceconversion
AT zhāngwénhàn studyandanalysisofspeechmodelbasedvoiceconversion
AT changwenhan jīyúyǔyīnmóxíngzhīyǔyīnzhuǎnhuàntàntǎoyǔfēnxī
AT zhāngwénhàn jīyúyǔyīnmóxíngzhīyǔyīnzhuǎnhuàntàntǎoyǔfēnxī
_version_ 1719285055514214400