Summary: | 碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 99 === The environmental mismatch caused by additive noise and/or channel distortion often degrades the performance of a speech recognition system seriously. Therefore, various robustness methods have been proposed, and one prevalent school of thought aims to refine the modulation spectra of speech feature sequences. In this thesis, we proposed two novel methods to normalize the modulation spectra of speech feature sequences. First, we leverage nonnegative matrix factorization (NMF) to extract a common set of basis spectral vectors that discover the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The new modulation spectra of the speech features, constructed by mapping the original modulation spectra into the space spanned by these basis vectors, are demonstrated with good noise-robust capabilities. Second, to the render modulation spectra of speech feature sequences with a probabilistic perspective, we employ probabilistic latent semantic analysis (PLSA) with a latent set of topic distributions to explore the relationship between each modulation frequency and the magnitude modulation spectrum as a whole. All experiments were carried out on the Aurora-2 database and task. Experimental results show that the updated features via NMF and PLSA maintain high recognition accuracy for matched mismatched noisy conditions, which is quite competitive when compared to those obtained by other existing methods.
|