Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures
The performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9069188/ |
id |
doaj-087680b08d5f43cdb66d29671a251eef |
---|---|
record_format |
Article |
spelling |
doaj-087680b08d5f43cdb66d29671a251eef2021-03-30T01:45:31ZengIEEEIEEE Access2169-35362020-01-018730057301410.1109/ACCESS.2020.29883659069188Improving Low-Resource Speech Recognition Based on Improved NN-HMM StructuresXiusong Sun0https://orcid.org/0000-0003-0232-7069Qun Yang1https://orcid.org/0000-0001-6824-8473Shaohan Liu2https://orcid.org/0000-0001-6967-0262Xin Yuan3https://orcid.org/0000-0003-2261-1809College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, ChinaCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, ChinaCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, ChinaCollege of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, ChinaThe performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature Combinations. We applied these three methods to the network architecture and compared their results with baselines. Our proposal has achieved remarkable improvement in the task of mandarin speech recognition in the hybrid hidden Markov model - neural network approach on phoneme level. In order to verify the generalization ability of our proposed method, we conducted many comparative experiments on DNN, RNN, LSTM and other network structures. The experimental results show that our method is applicable to almost all currently widely used network structures. Compared to baselines, our proposals achieved an average relative Character Error Rate (CER) reduction of 8.0%. In our experiments, the size of training data is ~10 hours, and we did not use data augmentation or transfer learning methods, which means that we did not use any additional data.https://ieeexplore.ieee.org/document/9069188/Low-resourcespeech recognitionmultitask learningacoustic modelingfeature combinations |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiusong Sun Qun Yang Shaohan Liu Xin Yuan |
spellingShingle |
Xiusong Sun Qun Yang Shaohan Liu Xin Yuan Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures IEEE Access Low-resource speech recognition multitask learning acoustic modeling feature combinations |
author_facet |
Xiusong Sun Qun Yang Shaohan Liu Xin Yuan |
author_sort |
Xiusong Sun |
title |
Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures |
title_short |
Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures |
title_full |
Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures |
title_fullStr |
Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures |
title_full_unstemmed |
Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures |
title_sort |
improving low-resource speech recognition based on improved nn-hmm structures |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
The performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature Combinations. We applied these three methods to the network architecture and compared their results with baselines. Our proposal has achieved remarkable improvement in the task of mandarin speech recognition in the hybrid hidden Markov model - neural network approach on phoneme level. In order to verify the generalization ability of our proposed method, we conducted many comparative experiments on DNN, RNN, LSTM and other network structures. The experimental results show that our method is applicable to almost all currently widely used network structures. Compared to baselines, our proposals achieved an average relative Character Error Rate (CER) reduction of 8.0%. In our experiments, the size of training data is ~10 hours, and we did not use data augmentation or transfer learning methods, which means that we did not use any additional data. |
topic |
Low-resource speech recognition multitask learning acoustic modeling feature combinations |
url |
https://ieeexplore.ieee.org/document/9069188/ |
work_keys_str_mv |
AT xiusongsun improvinglowresourcespeechrecognitionbasedonimprovednnhmmstructures AT qunyang improvinglowresourcespeechrecognitionbasedonimprovednnhmmstructures AT shaohanliu improvinglowresourcespeechrecognitionbasedonimprovednnhmmstructures AT xinyuan improvinglowresourcespeechrecognitionbasedonimprovednnhmmstructures |
_version_ |
1724186399744196608 |