A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition

碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 93 === In the course of evolution for thousands of years, human beings have continuously acquired as well as accumulated their knowledge from their daily life. Therefore, the civilization and evolution of human beings were almost on a par with each other in the past s...

Full description

Bibliographic Details
Main Authors: Liu Cheng-Wei, 劉成韋
Other Authors: 陳柏琳
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/45415359257552707820
id ndltd-TW-093NTNU5392017
record_format oai_dc
spelling ndltd-TW-093NTNU53920172016-06-03T04:13:53Z http://ndltd.ncl.edu.tw/handle/45415359257552707820 A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition 強健性語音辨識上關於特徵正規化與其它改良技術的研究 Liu Cheng-Wei 劉成韋 碩士 國立臺灣師範大學 資訊工程研究所 93 In the course of evolution for thousands of years, human beings have continuously acquired as well as accumulated their knowledge from their daily life. Therefore, the civilization and evolution of human beings were almost on a par with each other in the past several thousand years. However, the quick development of technology nowadays has surmounted the evolution of human beings further. For example, huge quantities of multimedia information, such as broadcast radio and television programs, voice mails, digital archives and so on, are continuously growing and filling our computers, networks and lives. Therefore, accessing multimedia information at anytime, anywhere by small handheld mobile devices is now becoming more and more emphasized. It is well known that speech is the primary and the most convenient means of communication between people, and it will play a more active role and serve as the major human-machine interface for the interaction between people and different kinds of smart devices in the near future. Hence, it would be much more comfortable if we could use speech as the human-machine interface, and automatically transcribe, retrieve and summarize multimedia using the speech information inherent in it. However, speech recognition is usually interfered with some complicated factors, such as the background and channel noises, speaker and linguistic variations, etc., which make the current state-of-the-art recognition systems still far from perfect. With these observations in mind, in this thesis, several attempts were made to improve the current speech robustness techniques, as well as to find a way to integrate them together. The experiments were carried out on the Aurora 2.0 database and the Mandarin broadcast news speech collected in Taiwan. Considering the phonetic characteristics of the Chinese language, a modified histogram equalization (MHEQ) approach was first proposed. Separated reference histograms for the silence and speech segments (MHEQ-2), or more precisely, the silence, INITIAL and FINAL segments (MHEQ-3) in Chinese, were established. The proposed approach can yield above 5.75% and 4.04% relative improvements over the baseline system and the conventional table-based histogram equalization (THEQ) approach, respectively, in the clean environments. Furthermore, the spectral entropy features obtained after Linear Discriminant Analysis (LDA) were used to augment the Mel-frequency cepsctral features, and considerable improvements were initially indicated. Finally, fusion of the above proposed approaches was also investigated with very promising results demonstrated. 陳柏琳 2005 學位論文 ; thesis 115 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣師範大學 === 資訊工程研究所 === 93 === In the course of evolution for thousands of years, human beings have continuously acquired as well as accumulated their knowledge from their daily life. Therefore, the civilization and evolution of human beings were almost on a par with each other in the past several thousand years. However, the quick development of technology nowadays has surmounted the evolution of human beings further. For example, huge quantities of multimedia information, such as broadcast radio and television programs, voice mails, digital archives and so on, are continuously growing and filling our computers, networks and lives. Therefore, accessing multimedia information at anytime, anywhere by small handheld mobile devices is now becoming more and more emphasized. It is well known that speech is the primary and the most convenient means of communication between people, and it will play a more active role and serve as the major human-machine interface for the interaction between people and different kinds of smart devices in the near future. Hence, it would be much more comfortable if we could use speech as the human-machine interface, and automatically transcribe, retrieve and summarize multimedia using the speech information inherent in it. However, speech recognition is usually interfered with some complicated factors, such as the background and channel noises, speaker and linguistic variations, etc., which make the current state-of-the-art recognition systems still far from perfect. With these observations in mind, in this thesis, several attempts were made to improve the current speech robustness techniques, as well as to find a way to integrate them together. The experiments were carried out on the Aurora 2.0 database and the Mandarin broadcast news speech collected in Taiwan. Considering the phonetic characteristics of the Chinese language, a modified histogram equalization (MHEQ) approach was first proposed. Separated reference histograms for the silence and speech segments (MHEQ-2), or more precisely, the silence, INITIAL and FINAL segments (MHEQ-3) in Chinese, were established. The proposed approach can yield above 5.75% and 4.04% relative improvements over the baseline system and the conventional table-based histogram equalization (THEQ) approach, respectively, in the clean environments. Furthermore, the spectral entropy features obtained after Linear Discriminant Analysis (LDA) were used to augment the Mel-frequency cepsctral features, and considerable improvements were initially indicated. Finally, fusion of the above proposed approaches was also investigated with very promising results demonstrated.
author2 陳柏琳
author_facet 陳柏琳
Liu Cheng-Wei
劉成韋
author Liu Cheng-Wei
劉成韋
spellingShingle Liu Cheng-Wei
劉成韋
A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition
author_sort Liu Cheng-Wei
title A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition
title_short A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition
title_full A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition
title_fullStr A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition
title_full_unstemmed A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition
title_sort study on feature normalization and other improved techniques for robust speech recognition
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/45415359257552707820
work_keys_str_mv AT liuchengwei astudyonfeaturenormalizationandotherimprovedtechniquesforrobustspeechrecognition
AT liúchéngwéi astudyonfeaturenormalizationandotherimprovedtechniquesforrobustspeechrecognition
AT liuchengwei qiángjiànxìngyǔyīnbiànshíshàngguānyútèzhēngzhèngguīhuàyǔqítāgǎiliángjìshùdeyánjiū
AT liúchéngwéi qiángjiànxìngyǔyīnbiànshíshàngguānyútèzhēngzhèngguīhuàyǔqítāgǎiliángjìshùdeyánjiū
AT liuchengwei studyonfeaturenormalizationandotherimprovedtechniquesforrobustspeechrecognition
AT liúchéngwéi studyonfeaturenormalizationandotherimprovedtechniquesforrobustspeechrecognition
_version_ 1718293001145417728