A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing
In recent years, neural network-based voice conversion methods have been rapidly developed, and many different models and neural networks have been applied in parallel voice conversion. However, the over-smoothing of parametric methods [e.g., bidirectional long short-term memory (BLSTM)] and the wav...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8695725/ |
id |
doaj-45b145412b6d4c1c832e1127d8f80b2a |
---|---|
record_format |
Article |
spelling |
doaj-45b145412b6d4c1c832e1127d8f80b2a2021-03-29T21:59:51ZengIEEEIEEE Access2169-35362019-01-017543215432910.1109/ACCESS.2019.29129268695725A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-ProcessingXiaokong Miao0https://orcid.org/0000-0002-7335-8500Xiongwei Zhang1Meng Sun2https://orcid.org/0000-0002-7435-3752Changyan Zheng3https://orcid.org/0000-0002-2088-9308Tieyong Cao4Laboratory of Intelligent Information Processing, Army Engineering University, Nanjing, ChinaLaboratory of Intelligent Information Processing, Army Engineering University, Nanjing, ChinaLaboratory of Intelligent Information Processing, Army Engineering University, Nanjing, ChinaLaboratory of Intelligent Information Processing, Army Engineering University, Nanjing, ChinaLaboratory of Intelligent Information Processing, Army Engineering University, Nanjing, ChinaIn recent years, neural network-based voice conversion methods have been rapidly developed, and many different models and neural networks have been applied in parallel voice conversion. However, the over-smoothing of parametric methods [e.g., bidirectional long short-term memory (BLSTM)] and the waveform collapse of neural vocoders (e.g., WaveNet) still have negative impacts on the quality of the converted voices. To overcome this problem, we propose a BLSTM and WaveNet-based voice conversion method cooperated with waveform collapse suppression by post-processing. This method firstly uses BLSTM to convert the acoustic features between parallel speakers, and then synthesizes pre-converted voice with WaveNet. Subsequently, several alternative iterations of BLSTM post-processing is performed, and the final converted voice is generated by WaveNet. The proposed method can directly generate converted audio waveforms and avoid the waveform-collapsed speech caused by a single WaveNet generation effectively. The experimental results indicate that acoustic features trained by using the BLSTM network could achieve better results than conventional baselines. From our experiments on VCC2018, the usage of WaveNet could alleviate the problem of over-smoothing, which contributes to improving the similarity and naturalness of the final results of voice conversion.https://ieeexplore.ieee.org/document/8695725/Voice conversionspeech synthesisBLSTMWaveNet |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiaokong Miao Xiongwei Zhang Meng Sun Changyan Zheng Tieyong Cao |
spellingShingle |
Xiaokong Miao Xiongwei Zhang Meng Sun Changyan Zheng Tieyong Cao A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing IEEE Access Voice conversion speech synthesis BLSTM WaveNet |
author_facet |
Xiaokong Miao Xiongwei Zhang Meng Sun Changyan Zheng Tieyong Cao |
author_sort |
Xiaokong Miao |
title |
A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing |
title_short |
A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing |
title_full |
A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing |
title_fullStr |
A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing |
title_full_unstemmed |
A BLSTM and WaveNet-Based Voice Conversion Method With Waveform Collapse Suppression by Post-Processing |
title_sort |
blstm and wavenet-based voice conversion method with waveform collapse suppression by post-processing |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
In recent years, neural network-based voice conversion methods have been rapidly developed, and many different models and neural networks have been applied in parallel voice conversion. However, the over-smoothing of parametric methods [e.g., bidirectional long short-term memory (BLSTM)] and the waveform collapse of neural vocoders (e.g., WaveNet) still have negative impacts on the quality of the converted voices. To overcome this problem, we propose a BLSTM and WaveNet-based voice conversion method cooperated with waveform collapse suppression by post-processing. This method firstly uses BLSTM to convert the acoustic features between parallel speakers, and then synthesizes pre-converted voice with WaveNet. Subsequently, several alternative iterations of BLSTM post-processing is performed, and the final converted voice is generated by WaveNet. The proposed method can directly generate converted audio waveforms and avoid the waveform-collapsed speech caused by a single WaveNet generation effectively. The experimental results indicate that acoustic features trained by using the BLSTM network could achieve better results than conventional baselines. From our experiments on VCC2018, the usage of WaveNet could alleviate the problem of over-smoothing, which contributes to improving the similarity and naturalness of the final results of voice conversion. |
topic |
Voice conversion speech synthesis BLSTM WaveNet |
url |
https://ieeexplore.ieee.org/document/8695725/ |
work_keys_str_mv |
AT xiaokongmiao ablstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT xiongweizhang ablstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT mengsun ablstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT changyanzheng ablstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT tieyongcao ablstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT xiaokongmiao blstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT xiongweizhang blstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT mengsun blstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT changyanzheng blstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing AT tieyongcao blstmandwavenetbasedvoiceconversionmethodwithwaveformcollapsesuppressionbypostprocessing |
_version_ |
1724192362170679296 |