Summary: | There are various pixel-based interpretation methods such as saliency map, gradient×input, DeepLIFT, integrated-gradient-n, etc. However, it is difficult to compare their performance as it involves human cognitive processes. We propose a metric that can quantify the distance from the importance scores of the interpretation methods to human intuition. We create a new dataset by adding a simple and small image, named as a stamp, to the original images. The importance scores for the deep neural networks to classify the stamped and regular images are calculated. Ideally, the pixel-based interpretation has to successfully select the stamps. Previous methods to compare different interpretation methods are useful only when the scale of the importance scores is the same. Whereas, we standardize the importance scores and define the measure to ideal scores. Our proposed method can quantitatively measure how the interpretation methods are close to human intuition.
|