Summary: | 碩士 === 國立臺北科技大學 === 電機工程系 === 106 === This work proposes an evaluation system for practicing spoken English. It can assist people to imitate the native English speaker to pronounce as similar accent as possible. Our system evaluates six indices including the speed, phrasing, pronunciation, volume, rhythm and intonation between the template speech and the test speech. Furthermore, it will present scores and feedbacks in every sentence for every index after the user recorded its shadowing speech and executed our system. People, especially Asian, is lack of speaking English and afraid of talking to foreigners, and consequently they have often a strange English accent. These years, people promote a method named “shadowing” which encourages people listen the template speech and try to repeat it. This method is a simple and effective way for everyone. But on the other hand, it doesn’t have a well-defined rule to evaluate how similar the user speech and the template speech are. Motivated by the above observation, we create this evaluation system and design many dedicated algorithms to extract speech features and compare the similarity automatically. Main pre-processing algorithms include the noise reduction, audio normalization, voice activity detection and segment alignment. Speech features include the short time energy, spectral energy, fundamental frequency and Mel-scale frequency. Dynamic Time Wrapping, Longest Common Subsequence and durational Pairwise Variability Index are applied to measure the similarity between the corresponding features. According to these methods, our work enables to obtain the difference for the speed, phrasing, pronunciation, loudness, rhythm and intonation between the template speech and the test speech. Also, our system prescribes customized scores and feedbacks to evaluate how precision user’s test is similar with the template. To sum up, people can practice spoken English along through using this proposed evaluation system. It makes the learning process more efficient and interesting. Meanwhile, people can speak fluently and avoid awkward accent.
|