A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition

碩士 === 國立中山大學 === 電機工程學系研究所 === 94 === A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixtu...

Full description

Bibliographic Details
Main Authors: Long-Cheng Wang, 王龍政
Other Authors: Chih-Chien Chen
Format: Others
Language:zh-TW
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/55168776720675963268
id ndltd-TW-094NSYS5442134
record_format oai_dc
spelling ndltd-TW-094NSYS54421342016-05-27T04:18:17Z http://ndltd.ncl.edu.tw/handle/55168776720675963268 A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition 多時段不特定語句語者辨識用電視影音資料庫之設計研究 Long-Cheng Wang 王龍政 碩士 國立中山大學 電機工程學系研究所 94 A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers’ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task. Chih-Chien Chen 陳志堅 2006 學位論文 ; thesis 51 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 電機工程學系研究所 === 94 === A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers’ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task.
author2 Chih-Chien Chen
author_facet Chih-Chien Chen
Long-Cheng Wang
王龍政
author Long-Cheng Wang
王龍政
spellingShingle Long-Cheng Wang
王龍政
A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition
author_sort Long-Cheng Wang
title A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition
title_short A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition
title_full A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition
title_fullStr A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition
title_full_unstemmed A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition
title_sort design of multi-session, text independent, tv-recorded audio-video database for speaker recognition
publishDate 2006
url http://ndltd.ncl.edu.tw/handle/55168776720675963268
work_keys_str_mv AT longchengwang adesignofmultisessiontextindependenttvrecordedaudiovideodatabaseforspeakerrecognition
AT wánglóngzhèng adesignofmultisessiontextindependenttvrecordedaudiovideodatabaseforspeakerrecognition
AT longchengwang duōshíduànbùtèdìngyǔjùyǔzhěbiànshíyòngdiànshìyǐngyīnzīliàokùzhīshèjìyánjiū
AT wánglóngzhèng duōshíduànbùtèdìngyǔjùyǔzhěbiànshíyòngdiànshìyǐngyīnzīliàokùzhīshèjìyánjiū
AT longchengwang designofmultisessiontextindependenttvrecordedaudiovideodatabaseforspeakerrecognition
AT wánglóngzhèng designofmultisessiontextindependenttvrecordedaudiovideodatabaseforspeakerrecognition
_version_ 1718282180587683840