Summary: | 博士 === 國立清華大學 === 電機工程學系 === 94 === In this dissertation, we present two new techniques for single channel speech enhancement. These techniques reduce the noise in each subband based on the critical-band-wavelet-packet decomposition. A noise masking threshold (NMT) is employed to adjust wavelet-coefficient threshold or gain function for a subband.
The first approach is to convert a noisy signal into wavelet coefficients (WCs), and subtract a threshold from noisy WCs in each subband. The threshold of each subband is adapted according to the segmental SNR and the NMT. Thus the residual noise can be efficiently suppressed. In the noise-dominated frame, the background noise can be almost removed. In the speech-dominated subbands, the speech distortion can be reduced by decreasing the wavelet-coefficient threshold. Experimental results show that the background noise is reduced and that the residual noise is less structured than a system without using masking properties, while the level of speech distortion remains acceptable.
The second approach proposes a gain factor in each wavelet subband subject to a perceptual constraint. This perceptual constraint preserves the WCs of noisy speech when the level of residual noise is smaller than the NMT. A speech enhancement algorithm adapted with the NMT can cope with noisy speech corrupted by various types of colored noise. The performance of enhanced speech is characterized by a tradeoff between the amount of speech distortion and the level of musical residual noise. If the level of residual noise is smaller than the NMT, the human ear cannot perceive the corrupting noise. In this situation the gain factor is set to unity. Conversely, if the level of residual noise exceeds the NMT, then the gain factor tends to be smaller, and the corrupting noise is suppressed. Since the noise level is usually overestimated at low SNR, it leads to an underestimate of the gain factor. This results in more speech distortion and a muffled sound. Therefore, we propose a lower bound on the gain factor to prevent the noise level from being over-attenuated. The lower bound on gain factor is obtained by keeping the speech distortion smaller than the residual noise. Accordingly, the corresponding lower bound on the NMT is also obtained. This lower bound on the gain factor must be adapted to the noise level to reduce the speech distortion and minimize the musical residual noise. Experimental results show that the enhanced speech sounds more natural, and the musical residual noise is almost inaudible.
|