Summary: | In social media, images and texts are used to convey individuals' attitudes and feelings; thus, social media has become an indispensable part of people's lives. To understand social behavior and provide better recommendations, sentiment analysis on social media is helpful. One sentiment analysis task is polarity prediction. Although current research on visual or textual sentiment analysis has achieved quite good progress, multimodal and cross-modal analysis combining visual and textual correlation is still in the exploration stage. To capture a semantic connection between images and captions, this paper proposes a cross-modal approach that considers both images and captions in classifying image sentiment polarity. This method transfers the correlation between textual content to images. First, the image and its corresponding caption are sent into an inner-class mapping model, where they are transformed into vectors in Hilbert space to obtain their labels by calculating the inner-class maximum mean discrepancy (MMD). Then, a class-aware sentence representation (CASR) model assigns the distributed representation to the labels with a class-aware attention-based gated recurrent unit (GRU). Finally, an inner-class dependency LSTM (IDLSTM) classifies the sentiment polarity. Experiments carried out on the Getty Images dataset and Twitter 1269 dataset demonstrate the effectiveness of our approach. Moreover, extensive experimental results show that our model outperforms baseline solutions.
|