Semantic Understanding of Scenes Through the ADE20K Dataset

Semantic understanding of visual scenes is one of the holy grails of computer vision. Despite efforts of the community in data collection, there are still few image datasets covering a wide range of scenes and object categories with pixel-wise annotations for scene understanding. In this work, we pr...

Full description

Bibliographic Details
Main Authors:	Zhou, Bolei (Author), Zhao, Hang (Author), Puig Fernandez, Xavier (Author), Xiao, Tete (Author), Fidler, Sanja (Author), Barriuso, Adela (Author), Torralba, Antonio (Author)
Other Authors:	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory (Contributor)
Format:	Article
Language:	English
Published:	Springer Nature, 2020-06-11T20:32:21Z.
Subjects:	Article
Online Access:	Get fulltext


LEADER	01973 am a22002533u 4500
001	125771
042			\|a dc
100	1	0	\|a Zhou, Bolei \|e author
100	1	0	\|a Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory \|e contributor
700	1	0	\|a Zhao, Hang \|e author
700	1	0	\|a Puig Fernandez, Xavier \|e author
700	1	0	\|a Xiao, Tete \|e author
700	1	0	\|a Fidler, Sanja \|e author
700	1	0	\|a Barriuso, Adela \|e author
700	1	0	\|a Torralba, Antonio \|e author
245	0	0	\|a Semantic Understanding of Scenes Through the ADE20K Dataset
260			\|b Springer Nature, \|c 2020-06-11T20:32:21Z.
856			\|z Get fulltext \|u https://hdl.handle.net/1721.1/125771
520			\|a Semantic understanding of visual scenes is one of the holy grails of computer vision. Despite efforts of the community in data collection, there are still few image datasets covering a wide range of scenes and object categories with pixel-wise annotations for scene understanding. In this work, we present a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. Totally there are 25k images of the complex everyday scenes containing a variety of objects in their natural spatial context. On average there are 19.5 instances and 10.5 object classes per image. Based on ADE20K, we construct benchmarks for scene parsing and instance segmentation. We provide baseline performances on both of the benchmarks and re-implement state-of-the-art models for open source. We further evaluate the effect of synchronized batch normalization and find that a reasonably large batch size is crucial for the semantic segmentation performance. We show that the networks trained on ADE20K are able to segment a wide variety of scenes and objects.
520			\|a NSF (grant 1524817)
546			\|a en
655	7		\|a Article
773			\|t 10.1007/S11263-018-1140-0
773			\|t International Journal of Computer Vision

Semantic Understanding of Scenes Through the ADE20K Dataset

Similar Items