Summary: | 碩士 === 國立成功大學 === 電機工程學系 === 104 === Hidden Markov Models (HMMs) is one of the most popular methods for modern speech recognition. In this thesis, we propose an Automatic Speech-Speaker Recognition (ASSR) system on a FPGA platform. The ASSR system includes four parts: 1) pre-processing, 2) feature extraction, 3) speech and speaker recognition and 4) Out-of-Vocabulary (OOV) and Out-of-Speaker (OOS) detection.
This study adopts the Mel-frequency cepstral coefficients (MFCCs) as the features for feature extraction module. We use Hidden Markov Model (HMM) to build the acoustic model for each phoneme, and evaluate our approaches on two databases: the THCHS-30 (Tsinghua Chinese 30 hour database) and the CMU ARCTIC Databases.
The binary halved clustering (BHC) method uses binary-halved splitting to generate speaker models for low complexity requirement. The last part of ASSR uses the grammar to detect OOV, and the OOS detection algorithm to detect OOS.
The experiments are conducted on two types of platforms including PC and Xilinx Spartan-6 FPGA. The experimental results indicate that the proposed work can achieve 90.8%
of Mandarin speech recognition and 86.6% of English speech recognition rate, respectively. The work can achieve 88.7% of OOV detection rate of Mandarin and 84.9% of OOV detection
rate of English as well. The speaker recognition rate also reaches to 81.3% and OOS detection rate reaches to 80.8%, respectively.
|