Speech periodicity enhancement based on transform-domain signal decomposition and robust pitch estimation.

周期性是語音信號一個重要的特徵。周期性對於聲調語言更是不可或缺。在聲調語言中,音調的輪廓形狀決定了發音的語義。語音的周期性增強旨在修復受噪聲影響的語音信號的波形周期性,從而增強音調和聲調的聽覺感知。 === 本論文提出了一種新的語音周期性增強方法。在該方法中,語音的周期性增強通過對線性預測殘差信號在變換域的周期/非周期分解來達到。線性預測殘差信號一般被認為是語音周期性的主要載體,該信號以基音同步的方式通過重疊頻率變換在變換域被分解。周期性增強通過對變換參數的加權來達到:代表周期性成分的變換參數被增強,而代表非周期成分的變換參數被削弱。本文提出和評估了三種設置變換參數權重的方法。這三種權重分別是...

Full description

Bibliographic Details
Other Authors: Huang, Feng
Format: Others
Language:English
Chinese
Published: 2012
Subjects:
Online Access:http://library.cuhk.edu.hk/record=b5549620
http://repository.lib.cuhk.edu.hk/en/item/cuhk-328009
Description
Summary:周期性是語音信號一個重要的特徵。周期性對於聲調語言更是不可或缺。在聲調語言中,音調的輪廓形狀決定了發音的語義。語音的周期性增強旨在修復受噪聲影響的語音信號的波形周期性,從而增強音調和聲調的聽覺感知。 === 本論文提出了一種新的語音周期性增強方法。在該方法中,語音的周期性增強通過對線性預測殘差信號在變換域的周期/非周期分解來達到。線性預測殘差信號一般被認為是語音周期性的主要載體,該信號以基音同步的方式通過重疊頻率變換在變換域被分解。周期性增強通過對變換參數的加權來達到:代表周期性成分的變換參數被增強,而代表非周期成分的變換參數被削弱。本文提出和評估了三種設置變換參數權重的方法。這三種權重分別是固定權重、自適應權重和維納(Wiener)濾波器參數。 === 為了保証有效的周期/非周期分解,本研究提出了一種新的基音周期估計方法。該方法使用瞬時累積峰值譜作為語音的諧波特徵表示;噪聲對峰值譜影響的概率分布用高斯混合密度模型來表示。基音周期的估計問題則表示為l₁規則化的最大釋然估計。對於該非凸優化問題,本文提出了兩種凸優化方法來近似求解。本文提出的基音周期估計方法優于傳統方法,它在低信噪比的條件下能夠取得較高的估計准確率。 === 本文對提出的語音周期性增強方法進行了全面的實驗和評估。實驗結果表明,該方法能夠有效地修復受損語音的諧波結搆和波形周期性。對比其他測試的語音和語音周期性增強方法,本文的新方法能夠更顯著地提高語音的質量。其輸出語音音質的客觀測量參數,例如SNR和PESQ,優于其他方法。 === Periodicity is an important attribute of speech signals. It is an essential element of tonal languages, where the meaning of a word is determined by the pitch contour. Speech periodicity enhancement is the process of restoring waveform periodicity of noise-corrupted speech, in order to improve human perception of pitch and tone in noisy environments. === This thesis presents a novel approach to speech periodicity enhancement. The enhancement is achieved through periodic-aperiodic decomposition of the linear prediction residual signal in a transform domain. Transform coefficients that represent the periodic component are amplified to enhance the periodicity, and those coefficients representing the aperiodic components are attenuated to suppress the noise. We propose and evaluate dfferent methods of assigning coefficient weights for periodicity enhancement. These methods include simple fixed weights, adaptive weights, and transform-domain Wiener filtering. === As a key component for periodic-aperiodic decomposition, a novel method of robust pitch estimation is developed. The temporally accumulated peak spectrum is proposed as a robust representation of speech harmonics. Gaussian mixture model is employed to model the effect of noise on the peak spectrum. Pitch estimation is formulated as a problem of l₁-regularized maximum likelihood estimation, in which prior information is exploited. Two convex optimization approaches are developed to solve the associated non-convex optimization problem. The proposed pitch estimation method significantly outperforms the conventional methods. It attains high estimation accuracy for various types of noise at very low signal-to-noise ratio (e.g., -5 dB). === Experimental results confirm that with the proposed approach of periodicity enhancement, speech harmonic structure and waveform periodicity can be effectively restored. Compared with other speech and periodicity enhancement methods evaluated in this study, the proposed method can produce speech outputs with noticeably higher quality in terms of different objective measurements, such as SNR and PESQ. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Huang, Feng. === Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. === Includes bibliographical references (leaves 130-143). === Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. === Abstract also in Chinese. === Chapter 1 --- Introduction --- p.1 === Chapter 1.1 --- Speech enhancement --- p.1 === Chapter 1.2 --- Speech periodicity and enhancement --- p.4 === Chapter 1.3 --- Research motivations and objectives --- p.6 === Chapter 1.4 --- Thesis outline --- p.7 === Chapter I --- Transform-domain signal decomposition --- p.9 === Chapter 2 --- Speech representation model --- p.10 === Chapter 2.1 --- Overview of speech modeling --- p.10 === Chapter 2.2 --- Speech analysis --- p.12 === Chapter 2.2.1 --- Linear prediction analysis --- p.13 === Chapter 2.2.2 --- Constant-pitch warping --- p.15 === Chapter 2.2.3 --- Two-stage lapped frequency transforms --- p.18 === Chapter 2.2.4 --- Signal segmentation --- p.20 === Chapter 2.3 --- Speech synthesis --- p.22 === Chapter 2.4 --- Applications --- p.24 === Chapter 2.4.1 --- Speech coding --- p.24 === Chapter 2.4.2 --- Speech modification --- p.25 === Chapter 3 --- Signal decomposition for periodicity enhancement --- p.29 === Chapter 3.1 --- Periodic-aperiodic decomposition --- p.29 === Chapter 3.2 --- Periodicity enhancement of noisy speech --- p.33 === Chapter 3.2.1 --- Noise effect on transform coefficients --- p.33 === Chapter 3.2.2 --- Principle of periodicity enhancement --- p.34 === Chapter 3.2.3 --- Research focuses --- p.36 === Chapter 3.3 --- Related issues for noisy speech --- p.38 === Chapter 3.3.1 --- LP coefficient estimation --- p.38 === Chapter 3.3.2 --- Model adjustments --- p.40 === Chapter II --- Robust pitch estimation --- p.43 === Chapter 4 --- Pitch estimation with temporally accumulated peak spectrum --- p.44 === Chapter 4.1 --- Review of pitch estimation methods --- p.44 === Chapter 4.2 --- Peak spectrum and inter-frame spectrum similarity --- p.45 === Chapter 4.3 --- Temporally accumulated peak spectrum --- p.49 === Chapter 4.4 --- Pitch estimation using autocorrelation of TAPS --- p.53 === Chapter 4.5 --- Experimental evaluation --- p.55 === Chapter 4.5.1 --- Test data --- p.55 === Chapter 4.5.2 --- Performance metrics --- p.56 === Chapter 4.5.3 --- Accumulated frame number --- p.56 === Chapter 4.5.4 --- Experimental results --- p.57 === Chapter 5 --- Pitch estimation using sparse estimation techniques --- p.59 === Chapter 5.1 --- Sparse representation of TAPS --- p.59 === Chapter 5.2 --- Sparse weight estimation using l₁-regularized minimization --- p.60 === Chapter 5.2.1 --- Least absolute shrinkage and selection operator --- p.61 === Chapter 5.2.2 --- Gaussian mixture distribution for noise effect --- p.62 === Chapter 5.2.3 --- Convex approximation --- p.62 === Chapter 5.2.4 --- Difference-of-convex programming --- p.63 === Chapter 5.3 --- Pitch estimation from sparse weight vector --- p.65 === Chapter 5.4 --- Experimental evaluation --- p.65 === Chapter 5.4.1 --- Experiment settings --- p.65 === Chapter 5.4.2 --- Peak spectrum exemplar set --- p.67 === Chapter 5.4.3 --- Gaussian mixture density --- p.67 === Chapter 5.4.4 --- Experimental results --- p.68 === Chapter 5.5 --- Summary of robust pitch estimation --- p.76 === Chapter III --- Speech periodicity enhancement --- p.78 === Chapter 6 --- Transform-domain coefficient weighting --- p.79 === Chapter 6.1 --- Overview of the proposed framework --- p.79 === Chapter 6.2 --- Transform coefficient weighting --- p.81 === Chapter 6.3 --- Experimental evaluation --- p.82 === Chapter 6.3.1 --- Experiment settings --- p.83 === Chapter 6.3.2 --- Experimental results --- p.85 === Chapter 7 --- Adaptive coefficient weighting --- p.89 === Chapter 7.1 --- Motivation of adaptive weights --- p.89 === Chapter 7.2 --- Energy concentration of voiced and unvoiced speech --- p.91 === Chapter 7.2.1 --- Energy concentration measures --- p.91 === Chapter 7.2.2 --- Voiced/Unvoiced discrimination --- p.92 === Chapter 7.3 --- Pitch estimation confidence --- p.94 === Chapter 7.3.1 --- Basis of confidence measure --- p.94 === Chapter 7.3.2 --- Robustness of the confidence measure --- p.97 === Chapter 7.4 --- Adaptive coefficient weighting --- p.99 === Chapter 7.5 --- Experimental evaluation --- p.103 === Chapter 8 --- Transform-domain Wiener filtering --- p.107 === Chapter 8.1 --- Transform-domain Wiener filtering for periodicity enhancement --- p.108 === Chapter 8.1.1 --- The MMSE optimal Wiener filter --- p.108 === Chapter 8.1.2 --- Wiener filter for periodic component --- p.109 === Chapter 8.1.3 --- Wiener filter for aperiodic components --- p.109 === Chapter 8.2 --- Filter parameter estimation --- p.110 === Chapter 8.2.1 --- Filter parameters for aperiodic components --- p.110 === Chapter 8.2.2 --- Filter parameters for periodic component --- p.111 === Chapter 8.3 --- Experimental evaluation --- p.112 === Chapter 8.4 --- Summary of speech periodicity enhancement --- p.117 === Chapter 9 --- Conclusions and future directions --- p.119 === Chapter 9.1 --- Conclusions --- p.119 === Chapter 9.2 --- Contributions --- p.121 === Chapter 9.3 --- Future directions --- p.122 === Chapter A --- Algorithms for LP coefficient estimation --- p.123 === Chapter A.1 --- Kalman filtering of noisy speech --- p.123 === Chapter A.2 --- Codebook driven approach --- p.126 === Bibliography --- p.130