Summary: | One of the major obstacles that has to be faced when applying automatic emotion recognition to realistic human-machine interaction systems is the scarcity of labeled data for training a robust model. Motivated by this concern, this paper seeks to utmost exploit unlabeled data that are pervasively available in the real-world and easy to be collected, by means of novel semi-supervised learning (SSL) approaches. Conventional SSL methods such as self-training, suffer from their inherent drawback of error accumulation, i.e., the samples that are misclassified by the system are continuously employed to train the model in the following learning iterations. To address this major issue, we first propose an enhanced learning strategy, by which we re-evaluate the previously automatically labeled samples in each learning iteration, in order to update the training set by correcting the mislabeled samples. We further exploit multiple modalities and models in the SSL system, by using collaborative SSL, where all modalities and models are considered simultaneously; samples are selected by means of minimizing the joint entropy. This strategy is supposed to not only improve the performance of the model for data annotation and consequently enhance the trustability of the automatically labeled data, but also to elevate the diversity of selected data. To evaluate the effectiveness of the proposed approaches, we performed extensive experiments on the remote collaborative and affective database, which includes multimodal recordings of spontaneous affective interactions of dyads. The empirical results show that the proposed approaches significantly outperform recently well-established SSL methods.
|