Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition

In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based a...

Full description

Bibliographic Details
Main Authors:	Yuki Takashima, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Assistive technology deep learning dysarthria end-to-end model knowledge transfer multilingual
Online Access:	https://ieeexplore.ieee.org/document/8892556/

id	doaj-57867bac0f8249a8abebb5aa8cf8cd56
record_format	Article
spelling	doaj-57867bac0f8249a8abebb5aa8cf8cd562021-03-30T00:54:03ZengIEEEIEEE Access2169-35362019-01-01716432016432610.1109/ACCESS.2019.29518568892556Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech RecognitionYuki Takashima0https://orcid.org/0000-0001-8489-9487Ryoichi Takashima1https://orcid.org/0000-0002-9808-0250Tetsuya Takiguchi2https://orcid.org/0000-0001-5005-7679Yasuo Ariki3https://orcid.org/0000-0003-3473-2026Graduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanGraduate School of System Informatics, Kobe University, Kobe, JapanIn this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based approaches have exhibited promising performance. However, these approaches require a large amount of training data, and it is difficult to collect sufficient data from such dysarthric people. This paper proposes a transfer learning method that transfers two types of knowledge corresponding to the different datasets: the language-dependent (phonetic and linguistic) characteristic of unimpaired speech and the language-independent characteristic of dysarthric speech. The former is obtained from Japanese non-dysarthric speech data, and the latter is obtained from non-Japanese dysarthric speech data. In the proposed method, we pre-train a model using Japanese non-dysarthric speech and non-Japanese dysarthric speech, and thereafter, we fine-tune the model using the target Japanese dysarthric speech. To handle the speech data of the two different languages in one model, we employ language-specific decoder modules. Experimental results indicate that our proposed approach can significantly improve speech recognition performance compared with other approaches that do not use additional speech data.https://ieeexplore.ieee.org/document/8892556/Assistive technologydeep learningdysarthriaend-to-end modelknowledge transfermultilingual
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Yuki Takashima Ryoichi Takashima Tetsuya Takiguchi Yasuo Ariki
spellingShingle	Yuki Takashima Ryoichi Takashima Tetsuya Takiguchi Yasuo Ariki Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition IEEE Access Assistive technology deep learning dysarthria end-to-end model knowledge transfer multilingual
author_facet	Yuki Takashima Ryoichi Takashima Tetsuya Takiguchi Yasuo Ariki
author_sort	Yuki Takashima
title	Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_short	Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_full	Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_fullStr	Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_full_unstemmed	Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
title_sort	knowledge transferability between the speech data of persons with dysarthria speaking different languages for dysarthric speech recognition
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based approaches have exhibited promising performance. However, these approaches require a large amount of training data, and it is difficult to collect sufficient data from such dysarthric people. This paper proposes a transfer learning method that transfers two types of knowledge corresponding to the different datasets: the language-dependent (phonetic and linguistic) characteristic of unimpaired speech and the language-independent characteristic of dysarthric speech. The former is obtained from Japanese non-dysarthric speech data, and the latter is obtained from non-Japanese dysarthric speech data. In the proposed method, we pre-train a model using Japanese non-dysarthric speech and non-Japanese dysarthric speech, and thereafter, we fine-tune the model using the target Japanese dysarthric speech. To handle the speech data of the two different languages in one model, we employ language-specific decoder modules. Experimental results indicate that our proposed approach can significantly improve speech recognition performance compared with other approaches that do not use additional speech data.
topic	Assistive technology deep learning dysarthria end-to-end model knowledge transfer multilingual
url	https://ieeexplore.ieee.org/document/8892556/
work_keys_str_mv	AT yukitakashima knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition AT ryoichitakashima knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition AT tetsuyatakiguchi knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition AT yasuoariki knowledgetransferabilitybetweenthespeechdataofpersonswithdysarthriaspeakingdifferentlanguagesfordysarthricspeechrecognition
_version_	1724187693102923776

Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition

Similar Items