Visual and Textual Jointly Enhanced Interpretable Fashion Recommendation

With the rapid development of online shopping, interpretable personalized fashion recommendation using image has attracted increasing attention in recent years. The current work has been able to capture the user's preferences for visible features and provide visual explanations. However, they i...

Full description

Bibliographic Details
Main Authors: Qianqian Wu, Pengpeng Zhao, Zhiming Cui
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9046774/
Description
Summary:With the rapid development of online shopping, interpretable personalized fashion recommendation using image has attracted increasing attention in recent years. The current work has been able to capture the user's preferences for visible features and provide visual explanations. However, they ignored the invisible features, such as the material and quality of the clothes, and failed to offer textual explanations. To this end, we propose a Visual and Textual Jointly Enhanced Interpretable (VTJEI) model for fashion recommendations based on the product image and historical review. The VTJEI can provide more accurate recommendations and visual and textual explanations through the joint enhancement of textual information and visual information. Specifically, we design a bidirectional two-layer adaptive attention review model to capture the user's visible and invisible preferences to the target product and provide textual explanations by highlighting some words. Moreover, we propose a review-driven visual attention model to get a more personalized image representation driven by the user's preference obtained from the historical review. In this way, we not only realize the joint enhancement of visual information and textual information but also provide a visual explanation by highlighting some regions. Finally, we performed extensive experiments on real datasets to confirm the superiority of our model on Top-N recommendations. We also built a labeled dataset for evaluating our provided visible and invisible explanations quantitatively. The result shows that we can not only provide more accurate recommendations but also can provide both visual and textual explanations.
ISSN:2169-3536