Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based a...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8892556/ |
id |
doaj-57867bac0f8249a8abebb5aa8cf8cd56 |
---|---|
record_format |
Article |
spelling |
doaj-57867bac0f8249a8abebb5aa8cf8cd562021-03-30T00:54:03ZengIEEEIEEE Access2169-35362019-01-01716432016432610.1109/ACCESS.2019.29518568892556Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech RecognitionYuki Takashima0https://orcid.org/0000-0001-8489-9487Ryoichi Takashima1https://orcid.org/0000-0002-9808-0250Tetsuya Takiguchi2https://orcid.org/0000-0001-5005-7679Yasuo Ariki3https://orcid.org/0000-0003-3473-2026Graduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanIn this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based approaches have exhibited promising performance. However, these approaches require a large amount of training data, and it is difficult to collect sufficient data from such dysarthric people. This paper proposes a transfer learning method that transfers two types of knowledge corresponding to the different datasets: the language-dependent (phonetic and linguistic) characteristic of unimpaired speech and the language-independent characteristic of dysarthric speech. The former is obtained from Japanese non-dysarthric speech data, and the latter is obtained from non-Japanese dysarthric speech data. In the proposed method, we pre-train a model using Japanese non-dysarthric speech and non-Japanese dysarthric speech, and thereafter, we fine-tune the model using the target Japanese dysarthric speech. To handle the speech data of the two different languages in one model, we employ language-specific decoder modules. Experimental results indicate that our proposed approach can significantly improve speech recognition performance compared with other approaches that do not use additional speech data.https://ieeexplore.ieee.org/document/8892556/Assistive technologydeep learningdysarthriaend-to-end modelknowledge transfermultilingual |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yuki Takashima Ryoichi Takashima Tetsuya Takiguchi Yasuo Ariki |
spellingShingle |
Yuki Takashima Ryoichi Takashima Tetsuya Takiguchi Yasuo Ariki Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition IEEE Access Assistive technology deep learning dysarthria end-to-end model knowledge transfer multilingual |
author_facet |
Yuki Takashima Ryoichi Takashima Tetsuya Takiguchi Yasuo Ariki |
author_sort |
Yuki Takashima |
title |
Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition |
title_short |
Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition |
title_full |
Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition |
title_fullStr |
Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition |
title_full_unstemmed |
Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition |
title_sort |
knowledge transferability between the speech data of persons with dysarthria speaking different languages for dysarthric speech recognition |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based approaches have exhibited promising performance. However, these approaches require a large amount of training data, and it is difficult to collect sufficient data from such dysarthric people. This paper proposes a transfer learning method that transfers two types of knowledge corresponding to the different datasets: the language-dependent (phonetic and linguistic) characteristic of unimpaired speech and the language-independent characteristic of dysarthric speech. The former is obtained from Japanese non-dysarthric speech data, and the latter is obtained from non-Japanese dysarthric speech data. In the proposed method, we pre-train a model using Japanese non-dysarthric speech and non-Japanese dysarthric speech, and thereafter, we fine-tune the model using the target Japanese dysarthric speech. To handle the speech data of the two different languages in one model, we employ language-specific decoder modules. Experimental results indicate that our proposed approach can significantly improve speech recognition performance compared with other approaches that do not use additional speech data. |
topic |
Assistive technology deep learning dysarthria end-to-end model knowledge transfer multilingual |
url |
https://ieeexplore.ieee.org/document/8892556/ |
work_keys_str_mv |
AT yukitakashima knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition AT ryoichitakashima knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition AT tetsuyatakiguchi knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition AT yasuoariki knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition |
_version_ |
1724187693102923776 |