Improving Low-Resource Speech Recognition Based on Improved NN-HMM Structures

The performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature...

Full description

Bibliographic Details
Main Authors: Xiusong Sun, Qun Yang, Shaohan Liu, Xin Yuan
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9069188/
Description
Summary:The performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature Combinations. We applied these three methods to the network architecture and compared their results with baselines. Our proposal has achieved remarkable improvement in the task of mandarin speech recognition in the hybrid hidden Markov model - neural network approach on phoneme level. In order to verify the generalization ability of our proposed method, we conducted many comparative experiments on DNN, RNN, LSTM and other network structures. The experimental results show that our method is applicable to almost all currently widely used network structures. Compared to baselines, our proposals achieved an average relative Character Error Rate (CER) reduction of 8.0%. In our experiments, the size of training data is ~10 hours, and we did not use data augmentation or transfer learning methods, which means that we did not use any additional data.
ISSN:2169-3536