Summary: | This paper deals with the analysis and optimization of a speechcommand recognition system (SCRS) trained on Czech telephone databaseSpeechdat(E) for use in a selected noisy environment. The SCRS is basedon hidden Markov models of context dependent phones (triphones) andmel-frequency cepstral coefficients analysis of speech (MFCC). The mainaim is to analyze and to search for the optimal settings of SCRS withrespect to additive noise robustness without use of additionaltechniques for additive noise reduction. The analysis is pointed to theappropriate setting of MFCC computation, the silence model adjustmentand grammar selection possibilities. It is shown, that the correctperformance of SCRS strictly depends on an appropriate adjustment ofthe silence model. The ability of the silence model adaptation isconfirmed. When SNR is higher than 15 dB the suitable performance ofSCRS can be guarantied without any modification of the triphones speechmodels by: 1. the optimal setting of MFCC computation, 2. the propersilence model adaptation. The assumption of a speech commandrecognition system use in an environment where SNR is higher than 15 dBis fulfilled in many applications.
|