Summary: | 碩士 === 國立臺北科技大學 === 機電整合研究所 === 89 === This thesis investigates the effects of transcoded speech and real GSM speech on the performance of speaker verification for mobile voiced-activated trading system. The transcoded speech for simulation is obtained by transcoding microphone and wired telephone speech databases using various coding schemes. In order to match the real-world environments, a GSM speech database consisting of 20 male and 20 female speakers is also collected over the mobile wireless network. Three in-vehicle call environments are considered: stopped cars (0 km/hr) with running engine, running cars with driving speeds of 50 km/hr and 90 km/hr. Each speaker pronounced 40 7-digit strings at each condition. This results in a database of 4800 digit strings, which is suitable for use in related researches. A text-dependent Hidden Markov Model-based system is implemented for performance evaluation. Experimental results demonstrate that verification performance of real GSM speech is far worse than that of transcoded speech due to channel effects and background noise. Consequently, this investigation provides a useful and practical baseline of performance evaluation for mobile voice-activated trading systems. The results also indicate that 0 km/hr case yields the best performance in the matched conditions; 90 km/hr results in the worst performance in mismatched conditions; and performance of male is always superior to that of female in all conditions. Moreover, we find that the proposed mixed training model improves the performance in some cases.
|