A Study of Low Bit Rate Speech Codec with Speaker Recognizability
碩士 === 國立成功大學 === 資訊工程研究所 === 85 === In the past, low bit rate speech coders were mostly aimed at intelligibility and quality. The approaches for these works may result in lower speaker recognizability. In this paper, we present a low bit rate speech coder with better speaker recognizability using...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
1997
|
Online Access: | http://ndltd.ncl.edu.tw/handle/39322223777486191523 |
id |
ndltd-TW-085NCKU3392016 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-085NCKU33920162015-10-13T12:18:06Z http://ndltd.ncl.edu.tw/handle/39322223777486191523 A Study of Low Bit Rate Speech Codec with Speaker Recognizability 具語者特徵之低位元率語音編碼器之研究 Tsai, Jia-Ching 蔡佳青 碩士 國立成功大學 資訊工程研究所 85 In the past, low bit rate speech coders were mostly aimed at intelligibility and quality. The approaches for these works may result in lower speaker recognizability. In this paper, we present a low bit rate speech coder with better speaker recognizability using the selection of glottal excitation for a sopecific speaker. The parameters affecting speaker recognizability are pitch, linear prediction coefficients, and glottal excitation. Most low bit rate speech coders focused on finding a good and compact representaiton of glottal excitation. In order to suitably represent the glottal excitation, the excitation pulse determination algorithm used in Multi - Pulse Excited LPC is adopted. In this paper, 25 periodic pulses are determined for a voiced frame. A period of speaker - specific excitation pattern with only 3 pulses, one primary and two secondary pulses, is chosen from the 25 pulses using a proposed pattern selection method. This 3 - pulse pattern is used to represent the excitation of the voiced speech pronounced by the speaker and sent to the receiver. In the receiver, the 3 - pulse pattern is smoothed using an FIR low pass filter in order to obtain a more smooth and continuous pattern. For voiced speech, this smoothed pattern is used to synthesize speech signals via LPC model. For unvoiced speech, random white noise is adopted as the excitation pattern. The proposed approach has been implemented on a Pentium / 133 PC in Windows 95 and is running in real - time performance. The coder has MOS 2.5 while LPC - 10e has only 2.24. Speaker recobnizagility in this coder is also much more than that in traditional coders. Wu, Chung-Hsien 吳宗憲 1997 學位論文 ; thesis 40 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立成功大學 === 資訊工程研究所 === 85 === In the past, low bit rate speech coders were mostly aimed at intelligibility and quality. The approaches for these works may result in lower speaker recognizability. In this paper, we present a low bit rate speech coder with better speaker recognizability using the selection of glottal excitation for a sopecific speaker.
The parameters affecting speaker recognizability are pitch, linear prediction coefficients, and glottal excitation. Most low bit rate speech coders focused on finding a good and compact representaiton of glottal excitation. In order to suitably represent the glottal excitation, the excitation pulse determination algorithm used in Multi - Pulse Excited LPC is adopted. In this paper, 25 periodic pulses are determined for a voiced frame. A period of speaker - specific excitation pattern with only 3 pulses, one primary and two secondary pulses, is chosen from the 25 pulses using a proposed pattern selection method. This 3 - pulse pattern is used to represent the excitation of the voiced speech pronounced by the speaker and sent to the receiver. In the receiver, the 3 - pulse pattern is smoothed using an FIR low pass filter in order to obtain a more smooth and continuous pattern. For voiced speech, this smoothed pattern is used to synthesize speech signals via LPC model. For unvoiced speech, random white noise is adopted as the excitation pattern.
The proposed approach has been implemented on a Pentium / 133 PC in Windows 95 and is running in real - time performance. The coder has MOS 2.5 while LPC - 10e has only 2.24. Speaker recobnizagility in this coder is also much more than that in traditional coders.
|
author2 |
Wu, Chung-Hsien |
author_facet |
Wu, Chung-Hsien Tsai, Jia-Ching 蔡佳青 |
author |
Tsai, Jia-Ching 蔡佳青 |
spellingShingle |
Tsai, Jia-Ching 蔡佳青 A Study of Low Bit Rate Speech Codec with Speaker Recognizability |
author_sort |
Tsai, Jia-Ching |
title |
A Study of Low Bit Rate Speech Codec with Speaker Recognizability |
title_short |
A Study of Low Bit Rate Speech Codec with Speaker Recognizability |
title_full |
A Study of Low Bit Rate Speech Codec with Speaker Recognizability |
title_fullStr |
A Study of Low Bit Rate Speech Codec with Speaker Recognizability |
title_full_unstemmed |
A Study of Low Bit Rate Speech Codec with Speaker Recognizability |
title_sort |
study of low bit rate speech codec with speaker recognizability |
publishDate |
1997 |
url |
http://ndltd.ncl.edu.tw/handle/39322223777486191523 |
work_keys_str_mv |
AT tsaijiaching astudyoflowbitratespeechcodecwithspeakerrecognizability AT càijiāqīng astudyoflowbitratespeechcodecwithspeakerrecognizability AT tsaijiaching jùyǔzhětèzhēngzhīdīwèiyuánlǜyǔyīnbiānmǎqìzhīyánjiū AT càijiāqīng jùyǔzhětèzhēngzhīdīwèiyuánlǜyǔyīnbiānmǎqìzhīyánjiū AT tsaijiaching studyoflowbitratespeechcodecwithspeakerrecognizability AT càijiāqīng studyoflowbitratespeechcodecwithspeakerrecognizability |
_version_ |
1716857716731805696 |