Summary: | In Storytel’s application on which a user can read and listen to digitalized literature, a user is displayed a list of books where the first thing the user encounters is the book title and cover. A book cover is therefore essential to attract a consumer’s attention. In this study, we take a data-driven approach to investigate the design principles for book covers through deep learning models and explainable AI. The first aim is to explore how well a Convolutional Neural Network (CNN) can interpret and classify a book cover image according to its genre in a multi-class classification task. The second aim is to increase model interpretability and investigate model feature to genre correlations. With the help of the explanatory artificial intelligence method Gradient-weighted Class Activation Map (Grad-CAM), we analyze the pixel-wise contribution to the model prediction. In addition, object detection by YOLOv3 was implemented to investigate which objects are detectable and reoccurring in the book covers. An interplay between Grad-CAM and YOLOv3 was used to investigate how identified objects and features correlate to a specific book genre and ultimately answer what makes a good book cover. Using a State-of-the-Art CNN model architecture we achieve an accuracy of 48% with the best class-wise accuracies for genres Erotica, Economy & Business and Children with accuracies 73%, 67% and 66%. Quantitative results from the Grad-CAM and YOLOv3 interplay show some strong associations between objects and genres, while indicating weak associations between abstract design principles and genres. Furthermore, a qualitative analysis of Grad-CAM visualizations show strong relevance of certain objects and text fonts for specific book genres. It was also observed that the portrayal of a feature was relevant for the model prediction of certain genres.
|