Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
碩士 === 國立臺北科技大學 === 電機工程系所 === 93 === In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/783z7y |
id |
ndltd-TW-093TIT05442055 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093TIT054420552019-05-29T03:43:30Z http://ndltd.ncl.edu.tw/handle/783z7y Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression 基於多重線性迴歸語音活動檢測之多重模式MELP語音編碼器的設計與分析 Yin-Fan Chen 陳尹凡 碩士 國立臺北科技大學 電機工程系所 93 In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose of voice activity detection (VAD) is to determine whether the incoming speech frame is active voice or not. According to the VAD result, a multi-mode speech coder should pick out the corresponding encoding mode. Most of VAD algorithms use various characteristic parameters of speech signal to determine the VAD state, including energy, pitch, spectral distortion, zero-crossing rate and log area ratio, etc. In this thesis the multiple linear regression algorithm is adopted to train the weighting coefficients speech features and then to predict the VAD state. The experimental result shows that only the combination of features energy and zero-crossing rate in the regression model could have better performance and lower complexity than that of G.729’s VAD. We have developed a multi-mode MELP coder with three coding modes to encode the speech frame according to the VAD state and speech signal property. One is suitable for non-active voice with bit rate of 0.889 kbps, the other is for low correlation voice with bit rate of 2.489 kbps, another is for high correlation voice with bit rate 3.2 kbps. To reduce switching artifacts among the three modes’ boundaries, we impose two decision limits on modes transition. The objective and subjective tests show that in case of the average bit rate is smaller than that of the original MELP, the multi-mode MELP performs better. 簡福榮 2005 學位論文 ; thesis 69 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺北科技大學 === 電機工程系所 === 93 === In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose of voice activity detection (VAD) is to determine whether the incoming speech frame is active voice or not. According to the VAD result, a multi-mode speech coder should pick out the corresponding encoding mode.
Most of VAD algorithms use various characteristic parameters of speech signal to determine the VAD state, including energy, pitch, spectral distortion, zero-crossing rate and log area ratio, etc. In this thesis the multiple linear regression algorithm is adopted to train the weighting coefficients speech features and then to predict the VAD state. The experimental result shows that only the combination of features energy and zero-crossing rate in the regression model could have better performance and lower complexity than that of G.729’s VAD.
We have developed a multi-mode MELP coder with three coding modes to encode the speech frame according to the VAD state and speech signal property. One is suitable for non-active voice with bit rate of 0.889 kbps, the other is for low correlation voice with bit rate of 2.489 kbps, another is for high correlation voice with bit rate 3.2 kbps. To reduce switching artifacts among the three modes’ boundaries, we impose two decision limits on modes transition. The objective and subjective tests show that in case of the average bit rate is smaller than that of the original MELP, the multi-mode MELP performs better.
|
author2 |
簡福榮 |
author_facet |
簡福榮 Yin-Fan Chen 陳尹凡 |
author |
Yin-Fan Chen 陳尹凡 |
spellingShingle |
Yin-Fan Chen 陳尹凡 Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression |
author_sort |
Yin-Fan Chen |
title |
Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression |
title_short |
Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression |
title_full |
Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression |
title_fullStr |
Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression |
title_full_unstemmed |
Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression |
title_sort |
design and analysis of multi-mode melp coder with voice activity detection based on multiple linear regression |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/783z7y |
work_keys_str_mv |
AT yinfanchen designandanalysisofmultimodemelpcoderwithvoiceactivitydetectionbasedonmultiplelinearregression AT chényǐnfán designandanalysisofmultimodemelpcoderwithvoiceactivitydetectionbasedonmultiplelinearregression AT yinfanchen jīyúduōzhòngxiànxìnghuíguīyǔyīnhuódòngjiǎncèzhīduōzhòngmóshìmelpyǔyīnbiānmǎqìdeshèjìyǔfēnxī AT chényǐnfán jīyúduōzhòngxiànxìnghuíguīyǔyīnhuódòngjiǎncèzhīduōzhòngmóshìmelpyǔyīnbiānmǎqìdeshèjìyǔfēnxī |
_version_ |
1719193382982516736 |