Speaker Anonymization for Personal Information Protection Using Voice Conversion Techniques

As speech-based user interfaces integrated in the devices such as AI speakers become ubiquitous, a large amount of user voice data is being collected to enhance the accuracy of speech recognition systems. Since such voice data contain personal information that can endanger the privacy of users, the...

Full description

Bibliographic Details
Main Authors: In-Chul Yoo, Keonnyeong Lee, Seonggyun Leem, Hyunwoo Oh, Bonggu Ko, Dongsuk Yook
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9247219/
Description
Summary:As speech-based user interfaces integrated in the devices such as AI speakers become ubiquitous, a large amount of user voice data is being collected to enhance the accuracy of speech recognition systems. Since such voice data contain personal information that can endanger the privacy of users, the issue of privacy protection in the speech data has garnered increasing attention after the introduction of the General Data Protection Regulation in the EU, which implies that restrictions and safety measures for the use of speech data become essential. This study aims to filter the speaker-related voice biometrics present in speech data such as voice fingerprint without altering the linguistic content to preserve the usefulness of the data while protecting the privacy of users. To achieve this, we propose an algorithm that produces anonymized speeches by adopting many-to-many voice conversion techniques based on variational autoencoders (VAEs) and modifying the speaker identity vectors of the VAE input to anonymize the speech data. We validated the effectiveness of the proposed method by measuring the speaker-related information and the original linguistic information retained in the resultant speech, using an open source speaker recognizer and a deep neural network-based automatic speech recognizer, respectively. Using the proposed method, the speaker identification accuracy of the speech data was reduced to 0.1-9.2%, indicating successful anonymization, while the speech recognition accuracy was maintained as 78.2-81.3%.
ISSN:2169-3536