Summary: | 博士 === 國立成功大學 === 電機工程學系碩博士班 === 100 === In smart life, the development of smart portable devices and smart home appliances, have attracted the researchers to improve in their tiny size, high performance, interactive application, and powerful functionality. The speaker recognition plays the important role for the owner recognition in mobile device, and the enrollment authentication at smart home.
In this dissertation, we explore the speaker recognition in two fields, that is, the hardware implementation and the smart home application. In hardware realization, multiple platforms, such as ARM platform, FPGA platform and ARM+FPGA platform, are adopted to explore the speaker recognition, and realize into the embedded SoC system, VLSI architecture design and Hardware/Software co-design. In smart home, the speaker recognition is investigated in intelligent porch system to attain the nature way for home user authentication and to interact smartly with home appliances. However, the adverse and mismatch conditions influence the speaker expert, therefore, the speaker expert is proposed to fuse with other human cues, such as, speech expert, face expert and height detector, to reach the multi-modal and biometric recognition system for smart home.
In general, the speaker recognition can be categorized in two modalities, i.e. speaker identification and speaker verification. The speaker identification scores and determines the target speaker’s identity from unknown speaker in a close set of trained models, whereas the speaker verification verifies the claimed voice with corresponding claimed identity, through a confident threshold to determine the target speaker, such a task can be regarded as an open set.
Two critical phases are commonly addressed in speaker recognition, that is, model training and speaker recognition. Generally, the model training is time-consuming particularly in mobile device. This motives us to examine the training phase in hardware implementation to accelerate the training performance. In this dissertation, the Support Vector Machine (SVM) is exhibited for the speaker model training and classification, and the Sequential Minimum Optimization (SMO) algorithm in SVM, is used to accelerate the speaker model training. In order to realize the complex SMO algorithm on multiple hardware platforms, the SMO algorithm is analyzed and modified prior to the feasible steps and blocks, and then realized on several hardware platforms. The experimental results show that the VLSI design of SMO algorithm indeed accelerates the training speed, and the accuracy in speaker identification has no big difference compared with software simulation.
|