Emotion recognition neural network based on auxiliary modal supervised training

In order to solve the problem of imbalance of data samples in multi-modal data, the resource-rich text modal know-ledge was used to model the resource-poor acoustic mode, and an emotion recognition neural network was constructed by using the similarity between auxiliary modes to supervise training....

Full description

Bibliographic Details
Main Authors: Jiyun ZOU, Yunfeng XU
Format: Article
Language:zho
Published: Hebei University of Science and Technology 2020-10-01
Series:Journal of Hebei University of Science and Technology
Subjects:
Online Access:http://xuebao.hebust.edu.cn/hbkjdx/ch/reader/create_pdf.aspx?file_no=b202005006&flag=1&journal_
Description
Summary:In order to solve the problem of imbalance of data samples in multi-modal data, the resource-rich text modal know-ledge was used to model the resource-poor acoustic mode, and an emotion recognition neural network was constructed by using the similarity between auxiliary modes to supervise training. Firstly, the neural network with bi-GRU as the core was used to learn the initial feature vectors of the text and acoustic modalities. Secondly, the SoftMax function was used for emotion recognition prediction, and simultaneously a fully connected layer was used to generate the target feature vectors corresponding to the two modalities. Finally, the target feature vector assisted the supervised training by calculating the similarity between each other to improve the performance of emotion recognition. The results show that this neural network can perform four emotion classifications on the IEMOCAP data set to achieve a weighted accuracy of 82.6% and an unweighted accuracy of 81.3%. The research result provides a reference and method basis for emotion recognition and auxiliary modeling in the multi-modal field of artificial intelligence.
ISSN:1008-1542