Network dissection: quantifying interpretability of deep visual representations

We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual con...

Full description

Bibliographic Details
Main Authors: Bau, David (Author), Zhou, Bolei (Author), Khosla, Aditya (Author), Oliva, Aude (Author), Torralba, Antonio (Author)
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor)
Format: Article
Language:English
Published: IEEE, 2020-05-01T19:35:43Z.
Subjects:
Online Access:Get fulltext
LEADER 02171 am a22002173u 4500
001 124985
042 |a dc 
100 1 0 |a Bau, David  |e author 
100 1 0 |a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory  |e contributor 
700 1 0 |a Zhou, Bolei  |e author 
700 1 0 |a Khosla, Aditya  |e author 
700 1 0 |a Oliva, Aude  |e author 
700 1 0 |a Torralba, Antonio  |e author 
245 0 0 |a Network dissection: quantifying interpretability of deep visual representations 
260 |b IEEE,   |c 2020-05-01T19:35:43Z. 
856 |z Get fulltext  |u https://hdl.handle.net/1721.1/124985 
520 |a We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve different supervised and self-supervised training tasks. We further analyze the effect of training iterations, compare networks trained with different initializations, examine the impact of network depth and width, and measure the effect of dropout and batch normalization on the interpretability of deep visual representations. We demonstrate that the proposed method can shed light on characteristics of CNN models and training methods that go beyond measurements of their discriminative power. ©2017 Paper presented at the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), July 21-26, 2017, Honolulu, Hawaii. 
546 |a en 
655 7 |a Article 
773 |t 10.1109/cvpr.2017.354 
773 |t Proceedings, 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017)