Summary: | 博士 === 國立交通大學 === 資訊科學與工程研究所 === 97 === Speaker verification is usually formulated as a statistical hypothesis testing problem and solved by a likelihood ratio (LR) test. A speaker verification system’s performance is highly dependent on modeling the target speaker’s voice (the null hypothesis) and characterizing non-target speakers’ voices (the alternative hypothesis). However, since the alternative hypothesis involves unknown impostors, it is usually difficult to characterize a priori. In this dissertation, we propose a framework to better characterize the alternative hypothesis with the goal of optimally distinguishing the target speaker from impostors. The proposed framework is built on a weighted arithmetic combination (WAC) or a weighted geometric combination (WGC) of useful information extracted from a set of pre-trained background models. The parameters associated with WAC or WGC are then optimized using two discriminative training methods, namely the minimum verification error (MVE) training method and the proposed evolutionary MVE (EMVE) training method, such that both the false acceptance probability and the false rejection probability are minimized. Moreover, we also propose two new decision functions based on WGC and WAC, which can be regarded as nonlinear discriminant classifiers. To solve the weight vector w, we propose using two kernel-based discriminant techniques, namely the Kernel Fisher Discriminant (KFD) and Support Vector Machine (SVM), because of their ability to separate samples of target speakers from those of non-target speakers efficiently.
In recent years, the GMM-UBM system is the predominant approach for the text-independent speaker verification task. The advantage of the approach is that both the target speaker model and the impostor model (UBM) have generalization ability. However, since both models are trained according to separate criteria, the optimization procedure can not distinguish a target speaker from background speakers optimally. To improve the GMM-UBM approach, we propose a discriminative feedback adaptation (DFA) framework that allows generalization and discrimination to be considered jointly. The framework not only preserves the generalization ability of the GMM-UBM approach, but also reinforces the discriminability between the target speaker model and the UBM. Under DFA, rather than use a unified UBM, we construct a discriminative anti-model exclusively for each target speaker.
The results of speaker-verification experiments conducted on three speech corpora, the Extended M2VTS Database (XM2VTSDB), the ISCSLP2006-SRE database and the NIST2001-SRE database, show that the proposed methods outperform all of the conventional LR-based approaches.
|