A Study on Speech Signal Processing Using Wavelet Transforms

博士 === 國立成功大學 === 電機工程學系碩博士班 === 90 === Wavelet transform and its theory is one of the most exciting developments in the last decade. In fact, the wavelet transform has been developed independently for various fields such as signal processing, image processing, audio processing, communication, and a...

Full description

Bibliographic Details
Main Authors: Shi-Huang Chen, 陳璽煌
Other Authors: Jhing-Fa Wang
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/gfwb7p
id ndltd-TW-090NCKU5442168
record_format oai_dc
spelling ndltd-TW-090NCKU54421682018-06-25T06:05:29Z http://ndltd.ncl.edu.tw/handle/gfwb7p A Study on Speech Signal Processing Using Wavelet Transforms 應用小波轉換於語音信號處理之研究 Shi-Huang Chen 陳璽煌 博士 國立成功大學 電機工程學系碩博士班 90 Wavelet transform and its theory is one of the most exciting developments in the last decade. In fact, the wavelet transform has been developed independently for various fields such as signal processing, image processing, audio processing, communication, and applied mathematics. Due to the wavelet representation has characteristics of the efficient time-frequency localization and the multi-resolution analysis, the wavelet transforms are suitable for processing the non-stationary signals such as speech. Therefore, this thesis focuses on the study of wavelet-based speech signal processing and proposes a framework of speech signal processing using wavelet transform. Based on the proposed framework, this thesis develops four new wavelet-based speech signal processing algorithms including pitch detection, consonant/vowel (C/V) segmentation, speech enhancement, and voice active detection (VAD). Furthermore, in order to cancel out the aliasing distortion arose in the filterbank structure of wavelet transforms, this thesis also proposes an aliasing compensation algorithm to overcome this problem. The first part illustrated in this thesis is the wavelet-based pitch detection algorithm. This thesis applies the aliasing compensated wavelet transform and the modified spatial correlation function to improve the robustness of conventional pitch detection algorithms under noisy environments. Experimental results show the proposed pitch detection algorithm has the better performance than those of conventional algorithms no matter under clear or noisy environments. The second part of this thesis presents the wavelet-based C/V segmentation algorithm. This novel algorithm can directly detect the C/V segmentation point by the use of the product function and its energy profile. In comparison with conventional C/V segmentation algorithms, the proposed algorithm is no need to use pitch detector as well as backward processing. As a consequence, the accuracy of the proposed C/V segmentation algorithm can be increased substantially from those of conventional approaches. In the third part, this thesis proposes a wavelet-based speech enhancement method based on the perceptual wavelet packet decomposition (PWPD) and the time-adapted thresholding (TAT) in order to increase the perceptual speech quality after enhancement processing. With these improved techniques, the over thresholding of speech segments which is usually occurred in conventional speech enhancement schemes can be avoided. In addition, the advantage of this improved method is that it does not require a complicated estimation of the noise level or any knowledge of the SNR. Using both additive and real noises, experimental results demonstrate that the speech enhancement method proposed in this thesis is capable of outperforming conventional noise cancellation schemes. Finally, this thesis further applies the TAT algorithm developed in the third part to the application of VAD. This new wavelet-based VAD method also has the advantage that it needs not a complicated estimation of the noise level or any knowledge of the SNR. Experimental results show this new type of VAD method has an accurate detection rate even through the speech signal is seriously contaminated by the background noise. Jhing-Fa Wang 王駿發 2002 學位論文 ; thesis 138 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立成功大學 === 電機工程學系碩博士班 === 90 === Wavelet transform and its theory is one of the most exciting developments in the last decade. In fact, the wavelet transform has been developed independently for various fields such as signal processing, image processing, audio processing, communication, and applied mathematics. Due to the wavelet representation has characteristics of the efficient time-frequency localization and the multi-resolution analysis, the wavelet transforms are suitable for processing the non-stationary signals such as speech. Therefore, this thesis focuses on the study of wavelet-based speech signal processing and proposes a framework of speech signal processing using wavelet transform. Based on the proposed framework, this thesis develops four new wavelet-based speech signal processing algorithms including pitch detection, consonant/vowel (C/V) segmentation, speech enhancement, and voice active detection (VAD). Furthermore, in order to cancel out the aliasing distortion arose in the filterbank structure of wavelet transforms, this thesis also proposes an aliasing compensation algorithm to overcome this problem. The first part illustrated in this thesis is the wavelet-based pitch detection algorithm. This thesis applies the aliasing compensated wavelet transform and the modified spatial correlation function to improve the robustness of conventional pitch detection algorithms under noisy environments. Experimental results show the proposed pitch detection algorithm has the better performance than those of conventional algorithms no matter under clear or noisy environments. The second part of this thesis presents the wavelet-based C/V segmentation algorithm. This novel algorithm can directly detect the C/V segmentation point by the use of the product function and its energy profile. In comparison with conventional C/V segmentation algorithms, the proposed algorithm is no need to use pitch detector as well as backward processing. As a consequence, the accuracy of the proposed C/V segmentation algorithm can be increased substantially from those of conventional approaches. In the third part, this thesis proposes a wavelet-based speech enhancement method based on the perceptual wavelet packet decomposition (PWPD) and the time-adapted thresholding (TAT) in order to increase the perceptual speech quality after enhancement processing. With these improved techniques, the over thresholding of speech segments which is usually occurred in conventional speech enhancement schemes can be avoided. In addition, the advantage of this improved method is that it does not require a complicated estimation of the noise level or any knowledge of the SNR. Using both additive and real noises, experimental results demonstrate that the speech enhancement method proposed in this thesis is capable of outperforming conventional noise cancellation schemes. Finally, this thesis further applies the TAT algorithm developed in the third part to the application of VAD. This new wavelet-based VAD method also has the advantage that it needs not a complicated estimation of the noise level or any knowledge of the SNR. Experimental results show this new type of VAD method has an accurate detection rate even through the speech signal is seriously contaminated by the background noise.
author2 Jhing-Fa Wang
author_facet Jhing-Fa Wang
Shi-Huang Chen
陳璽煌
author Shi-Huang Chen
陳璽煌
spellingShingle Shi-Huang Chen
陳璽煌
A Study on Speech Signal Processing Using Wavelet Transforms
author_sort Shi-Huang Chen
title A Study on Speech Signal Processing Using Wavelet Transforms
title_short A Study on Speech Signal Processing Using Wavelet Transforms
title_full A Study on Speech Signal Processing Using Wavelet Transforms
title_fullStr A Study on Speech Signal Processing Using Wavelet Transforms
title_full_unstemmed A Study on Speech Signal Processing Using Wavelet Transforms
title_sort study on speech signal processing using wavelet transforms
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/gfwb7p
work_keys_str_mv AT shihuangchen astudyonspeechsignalprocessingusingwavelettransforms
AT chénxǐhuáng astudyonspeechsignalprocessingusingwavelettransforms
AT shihuangchen yīngyòngxiǎobōzhuǎnhuànyúyǔyīnxìnhàochùlǐzhīyánjiū
AT chénxǐhuáng yīngyòngxiǎobōzhuǎnhuànyúyǔyīnxìnhàochùlǐzhīyánjiū
AT shihuangchen studyonspeechsignalprocessingusingwavelettransforms
AT chénxǐhuáng studyonspeechsignalprocessingusingwavelettransforms
_version_ 1718704431267053568