Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression

碩士 === 國立臺北科技大學 === 電機工程系所 === 93 === In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose...

Full description

Bibliographic Details
Main Authors:	Yin-Fan Chen, 陳尹凡
Other Authors:	簡福榮
Format:	Others
Language:	zh-TW
Published:	2005
Online Access:	http://ndltd.ncl.edu.tw/handle/783z7y

id	ndltd-TW-093TIT05442055
record_format	oai_dc
spelling	ndltd-TW-093TIT054420552019-05-29T03:43:30Z http://ndltd.ncl.edu.tw/handle/783z7y Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression 基於多重線性迴歸語音活動檢測之多重模式MELP語音編碼器的設計與分析 Yin-Fan Chen 陳尹凡碩士國立臺北科技大學電機工程系所 93 In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose of voice activity detection (VAD) is to determine whether the incoming speech frame is active voice or not. According to the VAD result, a multi-mode speech coder should pick out the corresponding encoding mode. Most of VAD algorithms use various characteristic parameters of speech signal to determine the VAD state, including energy, pitch, spectral distortion, zero-crossing rate and log area ratio, etc. In this thesis the multiple linear regression algorithm is adopted to train the weighting coefficients speech features and then to predict the VAD state. The experimental result shows that only the combination of features energy and zero-crossing rate in the regression model could have better performance and lower complexity than that of G.729’s VAD. We have developed a multi-mode MELP coder with three coding modes to encode the speech frame according to the VAD state and speech signal property. One is suitable for non-active voice with bit rate of 0.889 kbps, the other is for low correlation voice with bit rate of 2.489 kbps, another is for high correlation voice with bit rate 3.2 kbps. To reduce switching artifacts among the three modes’ boundaries, we impose two decision limits on modes transition. The objective and subjective tests show that in case of the average bit rate is smaller than that of the original MELP, the multi-mode MELP performs better. 簡福榮 2005 學位論文 ; thesis 69 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺北科技大學 === 電機工程系所 === 93 === In speech conversation, there are about 28% of time slices being silent between talk spurts. If a simple encoding method is used to mimic the background noise, a lot of channel bandwidth will be saved and the computation complexity will be decreased. The purpose of voice activity detection (VAD) is to determine whether the incoming speech frame is active voice or not. According to the VAD result, a multi-mode speech coder should pick out the corresponding encoding mode. Most of VAD algorithms use various characteristic parameters of speech signal to determine the VAD state, including energy, pitch, spectral distortion, zero-crossing rate and log area ratio, etc. In this thesis the multiple linear regression algorithm is adopted to train the weighting coefficients speech features and then to predict the VAD state. The experimental result shows that only the combination of features energy and zero-crossing rate in the regression model could have better performance and lower complexity than that of G.729’s VAD. We have developed a multi-mode MELP coder with three coding modes to encode the speech frame according to the VAD state and speech signal property. One is suitable for non-active voice with bit rate of 0.889 kbps, the other is for low correlation voice with bit rate of 2.489 kbps, another is for high correlation voice with bit rate 3.2 kbps. To reduce switching artifacts among the three modes’ boundaries, we impose two decision limits on modes transition. The objective and subjective tests show that in case of the average bit rate is smaller than that of the original MELP, the multi-mode MELP performs better.
author2	簡福榮
author_facet	簡福榮 Yin-Fan Chen 陳尹凡
author	Yin-Fan Chen 陳尹凡
spellingShingle	Yin-Fan Chen 陳尹凡 Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
author_sort	Yin-Fan Chen
title	Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_short	Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_full	Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_fullStr	Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_full_unstemmed	Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression
title_sort	design and analysis of multi-mode melp coder with voice activity detection based on multiple linear regression
publishDate	2005
url	http://ndltd.ncl.edu.tw/handle/783z7y
work_keys_str_mv	AT yinfanchen designandanalysisofmultimodemelpcoderwithvoiceactivitydetectionbasedonmultiplelinearregression AT chényǐnfán designandanalysisofmultimodemelpcoderwithvoiceactivitydetectionbasedonmultiplelinearregression AT yinfanchen jīyúduōzhòngxiànxìnghuíguīyǔyīnhuódòngjiǎncèzhīduōzhòngmóshìmelpyǔyīnbiānmǎqìdeshèjìyǔfēnxī AT chényǐnfán jīyúduōzhòngxiànxìnghuíguīyǔyīnhuódòngjiǎncèzhīduōzhòngmóshìmelpyǔyīnbiānmǎqìdeshèjìyǔfēnxī
_version_	1719193382982516736

Design and Analysis of Multi-Mode MELP Coder with Voice Activity Detection Based on Multiple Linear Regression

Similar Items