Summary: | 碩士 === 國立成功大學 === 電機工程學系碩博士班 === 96 === Technology always comes from human nature. The growing popularity of speech recognition applicants in living still has great room for improvement. How to improve the speech recognition technology and bring more convenient for people is our continuously effort target.
Currently, most living speech recognition applications need either a speaker independent model or a dependent model with user training at first, such as a voice-activated toy and a mobile phone with voice dialing. We think that the speech recognition with speaker independent model is not convenient enough for home environment with fixed family members. Therefore, this thesis proposes an on-line multi-speaker adaptation based on SVM and MLLR for ubiquitous speech recognition system. This system can not only speech recognizes with the appropriate model for every family member to improve accuracy, but also on-line adapts the acoustic model to be near to the speakers’ characteristic when they use the system.
The presented novel architecture can improve the adaptation modeling accuracy of the conventional maximum likelihood linear regression (MLLR) technique. The proposed system contains three phases: the training phase, the recognition phase, and the adaptation phase. First, in the training phase, we generate MLLR regression matrix sets of the Gaussian mixture model (GMM) parameters to construct the eigen-space, and build SVM speaker classification model by a few training data. Next, in the recognition phase, we integrate the speaker independent model with the MLLR transformation matrix set generated from the MLLR transformation matrix set database by the speaker class result of SVM classifier. Then we recognize the test speech by this adapted model. In the adaptation phase, we replace the present MLLR transformation matrix set by the adapted MLLR transformation matrix set merged by weighting from three transformation matrix sets: 1) the present MLLR transformation matrix set; 2) the MLLR regression transformation matrix set estimated from by maximum likelihood from eigenspace and recognition result; and 3) the MLLR regression transformation matrix set adapted by the speech recognition results which are judged by confidence measure to decrease the error training because the noise or the wrong speech recognition.
The experimental results show that the proposed method can averagely improve speech recognition accuracy about 3% ~8% with speaker adaptation.
|