Speech Enhancement Using Variable Peak-Spectrum-Holding Length Adapted by Harmonic Properties for Frame-Zero-Padding Method

碩士 === 亞洲大學 === 資訊傳播學系 === 104 === Speech and noise signals are mixed in the same channel in speech communication. A speech enhancement system is employed to remove corruption noise, enabling a listener to understand the meaning of received speech. The accuracy of noise estimation significantly affe...

Full description

Bibliographic Details
Main Authors: WU, WEI-LI, 吳威俐
Other Authors: LU, CHING-TA
Format: Others
Language:zh-TW
Published: 2016
Online Access:http://ndltd.ncl.edu.tw/handle/k63t7c
Description
Summary:碩士 === 亞洲大學 === 資訊傳播學系 === 104 === Speech and noise signals are mixed in the same channel in speech communication. A speech enhancement system is employed to remove corruption noise, enabling a listener to understand the meaning of received speech. The accuracy of noise estimation significantly affects the performance of enhanced speech. This thesis proposes using a frame-zero padding and peak-spectrum-holding methods adapted by harmonic properties to improve the accuracy of noise estimation. Because speech signals are absent during the zero-padded frames, we can estimate the magnitude of noise spectrum during these periods. In order to improve the performance of frame-zero padding method, robust harmonics in a vowel frame is estimated and employed to adapt the segment length for noise estimation. In the case of a non-vowel frame, the segment length is increased to adequately over-estimate the noise magnitude by the peak-spectrum holding method. So the residual noise can be significantly reduced in enhanced speech. On the contrary, the noise estimate is updated instantaneously during a vowel period, so the noise estimate can be prevented from over-estimation obtained by the peak-spectrum holding method. Accordingly, enhanced speech does not suffer from serious speech distortion. Experimental results show that the proposed method can efficiently remove background and residual noise during speech pause regions, enabling enhanced speech to sound distinct and comfortable.