Summary: | The processing of emotion has a wide range of applications in many different fields and has become the subject of increasing interest and attention for many speech and language researchers. Speech emotion recognition systems face many challenges. One of these is the degree of naturalness of the emotions in speech corpora. To prove the ability of speakers to accurately emulate emotions and to check whether listeners could identify the intended emotion, a human perception test was designed for a new emotional speech corpus. This paper presents an exhaustive statistical and perceptual investigation of the emotional speech corpus (KSUEmotions) for Arabic King Saud University approved by the Linguistic Data Consortium. The KSUEmotions corpus was built in two phases and involved 23 native speakers (10 males and 13 females) to emulate the following five emotions: neutral, sadness, happiness, surprise, and anger. Nine listeners were participated in a blind and randomly structured human perceptual test to assess the validity of the intended emotions. Statistical tests were used to analyze the effects of speaker gender, reviewer (listener) gender, emotion type, sentence length, and the interaction between these factors. Conducted statistical tests included the two-way analysis of variance, normality, chi-square, Bonferroni, Tukey, and Mann–Whitney U tests. One of the outcomes of the study is that the speaker gender, emotion type, and interaction between emotion type and speaker gender yield significant effects on the emotion perception in this corpus.
|