Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression

碩士 === 國立臺北科技大學 === 電機工程系所 === 93 === In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose...

Full description

Bibliographic Details
Main Authors: Yin-Fan Chen, 陳尹凡
Other Authors: 簡福榮
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/783z7y
Description
Summary:碩士 === 國立臺北科技大學 === 電機工程系所 === 93 === In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose of voice activity detection (VAD) is to determine whether the incoming speech frame is active voice or not. According to the VAD result, a multi-mode speech coder should pick out the corresponding encoding mode. Most of VAD algorithms use various characteristic parameters of speech signal to determine the VAD state, including energy, pitch, spectral distortion, zero-crossing rate and log area ratio, etc. In this thesis the multiple linear regression algorithm is adopted to train the weighting coefficients speech features and then to predict the VAD state. The experimental result shows that only the combination of features energy and zero-crossing rate in the regression model could have better performance and lower complexity than that of G.729’s VAD. We have developed a multi-mode MELP coder with three coding modes to encode the speech frame according to the VAD state and speech signal property. One is suitable for non-active voice with bit rate of 0.889 kbps, the other is for low correlation voice with bit rate of 2.489 kbps, another is for high correlation voice with bit rate 3.2 kbps. To reduce switching artifacts among the three modes’ boundaries, we impose two decision limits on modes transition. The objective and subjective tests show that in case of the average bit rate is smaller than that of the original MELP, the multi-mode MELP performs better.