A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition
碩士 === 國立中山大學 === 電機工程學系研究所 === 94 === A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixtu...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2006
|
Online Access: | http://ndltd.ncl.edu.tw/handle/55168776720675963268 |
id |
ndltd-TW-094NSYS5442134 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-094NSYS54421342016-05-27T04:18:17Z http://ndltd.ncl.edu.tw/handle/55168776720675963268 A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition 多時段不特定語句語者辨識用電視影音資料庫之設計研究 Long-Cheng Wang 王龍政 碩士 國立中山大學 電機工程學系研究所 94 A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers’ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task. Chih-Chien Chen 陳志堅 2006 學位論文 ; thesis 51 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 電機工程學系研究所 === 94 === A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers’ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task.
|
author2 |
Chih-Chien Chen |
author_facet |
Chih-Chien Chen Long-Cheng Wang 王龍政 |
author |
Long-Cheng Wang 王龍政 |
spellingShingle |
Long-Cheng Wang 王龍政 A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition |
author_sort |
Long-Cheng Wang |
title |
A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition |
title_short |
A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition |
title_full |
A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition |
title_fullStr |
A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition |
title_full_unstemmed |
A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition |
title_sort |
design of multi-session, text independent, tv-recorded audio-video database for speaker recognition |
publishDate |
2006 |
url |
http://ndltd.ncl.edu.tw/handle/55168776720675963268 |
work_keys_str_mv |
AT longchengwang adesignofmultisessiontextindependenttvrecordedaudiovideodatabaseforspeakerrecognition AT wánglóngzhèng adesignofmultisessiontextindependenttvrecordedaudiovideodatabaseforspeakerrecognition AT longchengwang duōshíduànbùtèdìngyǔjùyǔzhěbiànshíyòngdiànshìyǐngyīnzīliàokùzhīshèjìyánjiū AT wánglóngzhèng duōshíduànbùtèdìngyǔjùyǔzhěbiànshíyòngdiànshìyǐngyīnzīliàokùzhīshèjìyánjiū AT longchengwang designofmultisessiontextindependenttvrecordedaudiovideodatabaseforspeakerrecognition AT wánglóngzhèng designofmultisessiontextindependenttvrecordedaudiovideodatabaseforspeakerrecognition |
_version_ |
1718282180587683840 |