Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition

In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based a...

Full description

Bibliographic Details
Main Authors: Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8892556/
id doaj-57867bac0f8249a8abebb5aa8cf8cd56
record_format Article
spelling doaj-57867bac0f8249a8abebb5aa8cf8cd562021-03-30T00:54:03ZengIEEEIEEE Access2169-35362019-01-01716432016432610.1109/ACCESS.2019.29518568892556Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech RecognitionYuki Takashima0https://orcid.org/0000-0001-8489-9487Ryoichi Takashima1https://orcid.org/0000-0002-9808-0250Tetsuya Takiguchi2https://orcid.org/0000-0001-5005-7679Yasuo Ariki3https://orcid.org/0000-0003-3473-2026Graduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanIn this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based approaches have exhibited promising performance. However, these approaches require a large amount of training data, and it is difficult to collect sufficient data from such dysarthric people. This paper proposes a transfer learning method that transfers two types of knowledge corresponding to the different datasets: the language-dependent (phonetic and linguistic) characteristic of unimpaired speech and the language-independent characteristic of dysarthric speech. The former is obtained from Japanese non-dysarthric speech data, and the latter is obtained from non-Japanese dysarthric speech data. In the proposed method, we pre-train a model using Japanese non-dysarthric speech and non-Japanese dysarthric speech, and thereafter, we fine-tune the model using the target Japanese dysarthric speech. To handle the speech data of the two different languages in one model, we employ language-specific decoder modules. Experimental results indicate that our proposed approach can significantly improve speech recognition performance compared with other approaches that do not use additional speech data.https://ieeexplore.ieee.org/document/8892556/Assistive technologydeep learningdysarthriaend-to-end modelknowledge transfermultilingual
collection DOAJ
language English
format Article
sources DOAJ
author Yuki Takashima
Ryoichi Takashima
Tetsuya Takiguchi
Yasuo Ariki
spellingShingle Yuki Takashima
Ryoichi Takashima
Tetsuya Takiguchi
Yasuo Ariki
Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
IEEE Access
Assistive technology
deep learning
dysarthria
end-to-end model
knowledge transfer
multilingual
author_facet Yuki Takashima
Ryoichi Takashima
Tetsuya Takiguchi
Yasuo Ariki
author_sort Yuki Takashima
title Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_short Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_full Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_fullStr Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_full_unstemmed Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_sort knowledge transferability between the speech data of persons with dysarthria speaking different languages for dysarthric speech recognition
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based approaches have exhibited promising performance. However, these approaches require a large amount of training data, and it is difficult to collect sufficient data from such dysarthric people. This paper proposes a transfer learning method that transfers two types of knowledge corresponding to the different datasets: the language-dependent (phonetic and linguistic) characteristic of unimpaired speech and the language-independent characteristic of dysarthric speech. The former is obtained from Japanese non-dysarthric speech data, and the latter is obtained from non-Japanese dysarthric speech data. In the proposed method, we pre-train a model using Japanese non-dysarthric speech and non-Japanese dysarthric speech, and thereafter, we fine-tune the model using the target Japanese dysarthric speech. To handle the speech data of the two different languages in one model, we employ language-specific decoder modules. Experimental results indicate that our proposed approach can significantly improve speech recognition performance compared with other approaches that do not use additional speech data.
topic Assistive technology
deep learning
dysarthria
end-to-end model
knowledge transfer
multilingual
url https://ieeexplore.ieee.org/document/8892556/
work_keys_str_mv AT yukitakashima knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition
AT ryoichitakashima knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition
AT tetsuyatakiguchi knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition
AT yasuoariki knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition
_version_ 1724187693102923776