Speaker recognition using complementary information from vocal source and vocal tract.

Experimental results show that source-tract information fusion can also improve the robustness of speaker recognition systems in mismatched conditions. For example, relative improvements of 15.3% and 12.6% have been achieved for speaker identification and verification, respectively. === For speaker...

Full description

Bibliographic Details
Other Authors: Zheng, Nengheng.
Format: Others
Language:English
Chinese
Published: 2005
Subjects:
Online Access:http://library.cuhk.edu.hk/record=b6074159
http://repository.lib.cuhk.edu.hk/en/item/cuhk-343788
id ndltd-cuhk.edu.hk-oai-cuhk-dr-cuhk_343788
record_format oai_dc
collection NDLTD
language English
Chinese
format Others
sources NDLTD
topic Human-computer interaction
Speech processing systems
spellingShingle Human-computer interaction
Speech processing systems
Speaker recognition using complementary information from vocal source and vocal tract.
description Experimental results show that source-tract information fusion can also improve the robustness of speaker recognition systems in mismatched conditions. For example, relative improvements of 15.3% and 12.6% have been achieved for speaker identification and verification, respectively. === For speaker verification, a text-dependent weighting scheme is developed. Analysis results show that the source-tract discrimination ratio varies significantly across different sounds due to the diversity of vocal system configurations in speech production. This thesis analyzes the source-tract speaker discrimination ratio for the 10 Cantonese digits, upon which a digit-dependent source-tract weighting scheme is developed. Information fusion with such digit-dependent weights relatively improves the verification performance by 39.6% in matched conditions. === This thesis investigates the feasibility of using both vocal source and vocal tract information to improve speaker recognition performance. Conventional speaker recognition systems typically employ vocal tract related acoustic features, e.g the Mel-frequency cepstral coefficients (MFCC), for discriminative purpose. Motivated by the physiological significance of the vocal source and vocal tract system in speech production, this thesis develops a speaker recognition system to effectively incorporate these two complementary information sources for improved performance and robustness. === This thesis presents a novel approach of representing the speaker-specific vocal source characteristics. The linear predictive (LP) residual signal is adopted as a good representative of the vocal source excitation, in which the speaker specific information resides on both time and frequency domains. Haar transform and wavelet transform are applied for multi-resolution analyses of the LP residual signal. The resulting vocal source features, namely the Haar octave coefficients of residues (HOCOR) and wavelet octave coefficients of residues (WOCOR), can effectively extract the speaker-specific spectro-temporal characteristics of the LP residual signal. Particularly, with pitch-synchronous wavelet transform, the WOCOR feature set is capable of capturing the pitch-related low frequency properties and the high frequency information associated with pitch epochs, as well as their temporal variations within a pitch period and over consecutive periods. The generated vocal source and vocal tract features are complementary to each other since they are derived from two orthogonal components, the LP residual signal and LP coefficients. Therefore they can be fused to provide better speaker recognition performance. A preliminary scheme of fusing MFCC and WOCOR together illustrated that the identification and verification performance can be respectively improved by 34.6% and 23.6%, both in matched conditions. === To maximize the benefit obtained through the fusion of source and tract information, speaker discrimination dependent fusion techniques have been developed. For speaker identification, a confidence measure, which indicates the reliability of vocal source feature in speaker identification, is derived based on the discrimination ratio between the source and tract features in each identification trial. Information fusion with confidence measure offers better weighted scores given by the two features and avoids possible errors introduced by incorporating source information, thereby improves the identification performance further. Compared with MFCC, relative improvement of 46.8% has been achieved. === Zheng Nengheng. === "November 2005." === Adviser: Pak-Chung Ching. === Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6647. === Thesis (Ph.D.)--Chinese University of Hong Kong, 2005. === Includes bibliographical references (p. 123-135). === Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. === Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. === Abstracts in English and Chinese. === School code: 1307.
author2 Zheng, Nengheng.
author_facet Zheng, Nengheng.
title Speaker recognition using complementary information from vocal source and vocal tract.
title_short Speaker recognition using complementary information from vocal source and vocal tract.
title_full Speaker recognition using complementary information from vocal source and vocal tract.
title_fullStr Speaker recognition using complementary information from vocal source and vocal tract.
title_full_unstemmed Speaker recognition using complementary information from vocal source and vocal tract.
title_sort speaker recognition using complementary information from vocal source and vocal tract.
publishDate 2005
url http://library.cuhk.edu.hk/record=b6074159
http://repository.lib.cuhk.edu.hk/en/item/cuhk-343788
_version_ 1718978180983816192
spelling ndltd-cuhk.edu.hk-oai-cuhk-dr-cuhk_3437882019-02-19T03:43:38Z Speaker recognition using complementary information from vocal source and vocal tract. CUHK electronic theses & dissertations collection Human-computer interaction Speech processing systems Experimental results show that source-tract information fusion can also improve the robustness of speaker recognition systems in mismatched conditions. For example, relative improvements of 15.3% and 12.6% have been achieved for speaker identification and verification, respectively. For speaker verification, a text-dependent weighting scheme is developed. Analysis results show that the source-tract discrimination ratio varies significantly across different sounds due to the diversity of vocal system configurations in speech production. This thesis analyzes the source-tract speaker discrimination ratio for the 10 Cantonese digits, upon which a digit-dependent source-tract weighting scheme is developed. Information fusion with such digit-dependent weights relatively improves the verification performance by 39.6% in matched conditions. This thesis investigates the feasibility of using both vocal source and vocal tract information to improve speaker recognition performance. Conventional speaker recognition systems typically employ vocal tract related acoustic features, e.g the Mel-frequency cepstral coefficients (MFCC), for discriminative purpose. Motivated by the physiological significance of the vocal source and vocal tract system in speech production, this thesis develops a speaker recognition system to effectively incorporate these two complementary information sources for improved performance and robustness. This thesis presents a novel approach of representing the speaker-specific vocal source characteristics. The linear predictive (LP) residual signal is adopted as a good representative of the vocal source excitation, in which the speaker specific information resides on both time and frequency domains. Haar transform and wavelet transform are applied for multi-resolution analyses of the LP residual signal. The resulting vocal source features, namely the Haar octave coefficients of residues (HOCOR) and wavelet octave coefficients of residues (WOCOR), can effectively extract the speaker-specific spectro-temporal characteristics of the LP residual signal. Particularly, with pitch-synchronous wavelet transform, the WOCOR feature set is capable of capturing the pitch-related low frequency properties and the high frequency information associated with pitch epochs, as well as their temporal variations within a pitch period and over consecutive periods. The generated vocal source and vocal tract features are complementary to each other since they are derived from two orthogonal components, the LP residual signal and LP coefficients. Therefore they can be fused to provide better speaker recognition performance. A preliminary scheme of fusing MFCC and WOCOR together illustrated that the identification and verification performance can be respectively improved by 34.6% and 23.6%, both in matched conditions. To maximize the benefit obtained through the fusion of source and tract information, speaker discrimination dependent fusion techniques have been developed. For speaker identification, a confidence measure, which indicates the reliability of vocal source feature in speaker identification, is derived based on the discrimination ratio between the source and tract features in each identification trial. Information fusion with confidence measure offers better weighted scores given by the two features and avoids possible errors introduced by incorporating source information, thereby improves the identification performance further. Compared with MFCC, relative improvement of 46.8% has been achieved. Zheng Nengheng. "November 2005." Adviser: Pak-Chung Ching. Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6647. Thesis (Ph.D.)--Chinese University of Hong Kong, 2005. Includes bibliographical references (p. 123-135). Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. Abstracts in English and Chinese. School code: 1307. Zheng, Nengheng. Chinese University of Hong Kong Graduate School. Division of Electronic Engineering. 2005 Text theses electronic resource microform microfiche 1 online resource (xiv, 135 p. : ill.) cuhk:343788 isbn: 9780542965791 http://library.cuhk.edu.hk/record=b6074159 eng chi Use of this resource is governed by the terms and conditions of the Creative Commons “Attribution-NonCommercial-NoDerivatives 4.0 International” License (http://creativecommons.org/licenses/by-nc-nd/4.0/) http://repository.lib.cuhk.edu.hk/en/islandora/object/cuhk%3A343788/datastream/TN/view/Speaker%20recognition%20using%20complementary%20information%20from%20vocal%20source%20and%20vocal%20tract.jpghttp://repository.lib.cuhk.edu.hk/en/item/cuhk-343788