Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression

碩士 === 國立臺北科技大學 === 電機工程系所 === 93 === In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose...

Full description

Bibliographic Details
Main Authors: Yin-Fan Chen, 陳尹凡
Other Authors: 簡福榮
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/783z7y
id ndltd-TW-093TIT05442055
record_format oai_dc
spelling ndltd-TW-093TIT054420552019-05-29T03:43:30Z http://ndltd.ncl.edu.tw/handle/783z7y Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression 基於多重線性迴歸語音活動檢測之多重模式MELP語音編碼器的設計與分析 Yin-Fan Chen 陳尹凡 碩士 國立臺北科技大學 電機工程系所 93 In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose of voice activity detection (VAD) is to determine whether the incoming speech frame is active voice or not. According to the VAD result, a multi-mode speech coder should pick out the corresponding encoding mode. Most of VAD algorithms use various characteristic parameters of speech signal to determine the VAD state, including energy, pitch, spectral distortion, zero-crossing rate and log area ratio, etc. In this thesis the multiple linear regression algorithm is adopted to train the weighting coefficients speech features and then to predict the VAD state. The experimental result shows that only the combination of features energy and zero-crossing rate in the regression model could have better performance and lower complexity than that of G.729’s VAD. We have developed a multi-mode MELP coder with three coding modes to encode the speech frame according to the VAD state and speech signal property. One is suitable for non-active voice with bit rate of 0.889 kbps, the other is for low correlation voice with bit rate of 2.489 kbps, another is for high correlation voice with bit rate 3.2 kbps. To reduce switching artifacts among the three modes’ boundaries, we impose two decision limits on modes transition. The objective and subjective tests show that in case of the average bit rate is smaller than that of the original MELP, the multi-mode MELP performs better. 簡福榮 2005 學位論文 ; thesis 69 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺北科技大學 === 電機工程系所 === 93 === In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose of voice activity detection (VAD) is to determine whether the incoming speech frame is active voice or not. According to the VAD result, a multi-mode speech coder should pick out the corresponding encoding mode. Most of VAD algorithms use various characteristic parameters of speech signal to determine the VAD state, including energy, pitch, spectral distortion, zero-crossing rate and log area ratio, etc. In this thesis the multiple linear regression algorithm is adopted to train the weighting coefficients speech features and then to predict the VAD state. The experimental result shows that only the combination of features energy and zero-crossing rate in the regression model could have better performance and lower complexity than that of G.729’s VAD. We have developed a multi-mode MELP coder with three coding modes to encode the speech frame according to the VAD state and speech signal property. One is suitable for non-active voice with bit rate of 0.889 kbps, the other is for low correlation voice with bit rate of 2.489 kbps, another is for high correlation voice with bit rate 3.2 kbps. To reduce switching artifacts among the three modes’ boundaries, we impose two decision limits on modes transition. The objective and subjective tests show that in case of the average bit rate is smaller than that of the original MELP, the multi-mode MELP performs better.
author2 簡福榮
author_facet 簡福榮
Yin-Fan Chen
陳尹凡
author Yin-Fan Chen
陳尹凡
spellingShingle Yin-Fan Chen
陳尹凡
Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
author_sort Yin-Fan Chen
title Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_short Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_full Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_fullStr Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_full_unstemmed Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_sort design and analysis of multi-mode melp coder with voice activity detection based on multiple linear regression
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/783z7y
work_keys_str_mv AT yinfanchen designandanalysisofmultimodemelpcoderwithvoiceactivitydetectionbasedonmultiplelinearregression
AT chényǐnfán designandanalysisofmultimodemelpcoderwithvoiceactivitydetectionbasedonmultiplelinearregression
AT yinfanchen jīyúduōzhòngxiànxìnghuíguīyǔyīnhuódòngjiǎncèzhīduōzhòngmóshìmelpyǔyīnbiānmǎqìdeshèjìyǔfēnxī
AT chényǐnfán jīyúduōzhòngxiànxìnghuíguīyǔyīnhuódòngjiǎncèzhīduōzhòngmóshìmelpyǔyīnbiānmǎqìdeshèjìyǔfēnxī
_version_ 1719193382982516736