Study and Analysis of Speech Model-Based Voice Conversion
碩士 === 國立中正大學 === 資訊工程研究所 === 107 === The target of voice conversion based on speech model-based is to transfer source voice to target voice. Through the training process, converting function parameter is generated. And then, the speech voice will just like the target voice by acoustic feature an...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2019
|
Online Access: | http://ndltd.ncl.edu.tw/handle/kerybc |
id |
ndltd-TW-107CCU00392019 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-107CCU003920192019-11-01T05:28:07Z http://ndltd.ncl.edu.tw/handle/kerybc Study and Analysis of Speech Model-Based Voice Conversion 基於語音模型之語音轉換探討與分析 Chang, Wen-Han 張文瀚 碩士 國立中正大學 資訊工程研究所 107 The target of voice conversion based on speech model-based is to transfer source voice to target voice. Through the training process, converting function parameter is generated. And then, the speech voice will just like the target voice by acoustic feature analysis, mapping, and synthesis in sequence. This thesis will analyze how the speech model-based method achieves voice conversion by exploring Sprocket which is the baseline system of Voice Conversion Challenge 2018. The voice conversion part of Sprocket can be divided into three stages. First, analysis of acoustic feature is used to obtain the main factors that cause the difference of each person's voice. Second, through special processing on the acoustic features, the acoustic features are converted to the target’s acoustic features. Finally, the acoustic features are used to generate the speech of the target person through synthesis. This article will focus on exploring for the main algorithms. Through these algorithms we could realize how to find out the acoustic features and important information of speech. And also we could analysis the results of different models’ execution time. After carefully studying the Sprocket architecture and using related algorithms and functions, we design experiment about analysis, conversion, and synthesis. We choose different algorithms to reduce the execution time but not cause damage to the speech quality. Based on the Python in the original Sprocket, this experiment will try to development and design in C, and then replace and optimize the algorithm and function based on C implementation. After lots of optimizations and adjustments, the experiment finally achieving 2.785 times improvement in the overall execution time of the Sprocket. We achieve the goal of speeding up the voice conversion’s execution time without significantly affecting difference in converting quality by subjective hearing. Lin, Tay-Jyi 林泰吉 2019 學位論文 ; thesis 50 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中正大學 === 資訊工程研究所 === 107 === The target of voice conversion based on speech model-based is to transfer source voice to target voice. Through the training process, converting function parameter is generated. And then, the speech voice will just like the target voice by acoustic feature analysis, mapping, and synthesis in sequence.
This thesis will analyze how the speech model-based method achieves voice conversion by exploring Sprocket which is the baseline system of Voice Conversion Challenge 2018. The voice conversion part of Sprocket can be divided into three stages. First, analysis of acoustic feature is used to obtain the main factors that cause the difference of each person's voice. Second, through special processing on the acoustic features, the acoustic features are converted to the target’s acoustic features. Finally, the acoustic features are used to generate the speech of the target person through synthesis. This article will focus on exploring for the main algorithms. Through these algorithms we could realize how to find out the acoustic features and important information of speech. And also we could analysis the results of different models’ execution time.
After carefully studying the Sprocket architecture and using related algorithms and functions, we design experiment about analysis, conversion, and synthesis. We choose different algorithms to reduce the execution time but not cause damage to the speech quality. Based on the Python in the original Sprocket, this experiment will try to development and design in C, and then replace and optimize the algorithm and function based on C implementation. After lots of optimizations and adjustments, the experiment finally achieving 2.785 times improvement in the overall execution time of the Sprocket. We achieve the goal of speeding up the voice conversion’s execution time without significantly affecting difference in converting quality by subjective hearing.
|
author2 |
Lin, Tay-Jyi |
author_facet |
Lin, Tay-Jyi Chang, Wen-Han 張文瀚 |
author |
Chang, Wen-Han 張文瀚 |
spellingShingle |
Chang, Wen-Han 張文瀚 Study and Analysis of Speech Model-Based Voice Conversion |
author_sort |
Chang, Wen-Han |
title |
Study and Analysis of Speech Model-Based Voice Conversion |
title_short |
Study and Analysis of Speech Model-Based Voice Conversion |
title_full |
Study and Analysis of Speech Model-Based Voice Conversion |
title_fullStr |
Study and Analysis of Speech Model-Based Voice Conversion |
title_full_unstemmed |
Study and Analysis of Speech Model-Based Voice Conversion |
title_sort |
study and analysis of speech model-based voice conversion |
publishDate |
2019 |
url |
http://ndltd.ncl.edu.tw/handle/kerybc |
work_keys_str_mv |
AT changwenhan studyandanalysisofspeechmodelbasedvoiceconversion AT zhāngwénhàn studyandanalysisofspeechmodelbasedvoiceconversion AT changwenhan jīyúyǔyīnmóxíngzhīyǔyīnzhuǎnhuàntàntǎoyǔfēnxī AT zhāngwénhàn jīyúyǔyīnmóxíngzhīyǔyīnzhuǎnhuàntàntǎoyǔfēnxī |
_version_ |
1719285055514214400 |