TTS-Guided Training for Accent Conversion Without Parallel Data
Accent Conversion (AC) seeks to change the accent of speech from one (source) to another (target) while preserving the speech content and speaker identity. However, many existing AC approaches rely on source-target parallel speech data during training or reference speech at run-time. We propose a no...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2023
|
Subjects: | |
Online Access: | View Fulltext in Publisher View in Scopus |
Summary: | Accent Conversion (AC) seeks to change the accent of speech from one (source) to another (target) while preserving the speech content and speaker identity. However, many existing AC approaches rely on source-target parallel speech data during training or reference speech at run-time. We propose a novel accent conversion framework without the need for either parallel data or reference speech. Specifically, a text-to-speech (TTS) system is first pretrained with target-accented speech data. This TTS model and its hidden representations are expected to be associated only with the target accent. Then, a speech encoder is trained to convert the accent of the speech under the supervision of the pretrained TTS model. In doing so, the source-accented speech and its corresponding transcription are forwarded to the speech encoder and the pretrained TTS, respectively. The output of the speech encoder is optimized to be the same as the text embedding in the TTS system. At run-time, the speech encoder is combined with the pretrained speech decoder to convert the source-accented speech toward the target. In the experiments, we converted English with two source accents (Chinese/Indian) to the target accent (American/British/Canadian). Both objective metrics and subjective listening tests successfully validate that the proposed approach generates speech samples that are close to the target accent with high speech quality. Author |
---|---|
Physical Description: | 5 |
ISBN: | 10709908 (ISSN) |
DOI: | 10.1109/LSP.2023.3270079 |