Automatic Dense Annotation for Monocular 3D Scene Understanding

Deep neural networks have revolutionized many areas of computer vision, but they require notoriously large amounts of labeled training data. For tasks such as semantic segmentation and monocular 3d scene layout estimation, collecting high-quality training data is extremely laborious because dense, p...

Full description

Bibliographic Details
Main Authors:	Md Alimoor Reza, Kai Chen, Akshay Naik, David J. Crandall, Soon-Heung Jung
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Scene understanding 3D reconstruction semi-supervised learning computer vision
Online Access:	https://ieeexplore.ieee.org/document/9052727/

id	doaj-5470586de36342e9a4cc13d5acec8cf7
record_format	Article
spelling	doaj-5470586de36342e9a4cc13d5acec8cf72021-03-30T01:48:02ZengIEEEIEEE Access2169-35362020-01-018688526886510.1109/ACCESS.2020.29847459052727Automatic Dense Annotation for Monocular 3D Scene UnderstandingMd Alimoor Reza0https://orcid.org/0000-0001-7692-817XKai Chen1https://orcid.org/0000-0003-2799-9689Akshay Naik2https://orcid.org/0000-0002-5766-3556David J. Crandall3https://orcid.org/0000-0002-5827-5344Soon-Heung Jung4https://orcid.org/0000-0003-2041-5222Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USALuddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USALuddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USALuddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USAElectronics and Telecommunications Research Institute, Daejeon, South KoreaDeep neural networks have revolutionized many areas of computer vision, but they require notoriously large amounts of labeled training data. For tasks such as semantic segmentation and monocular 3d scene layout estimation, collecting high-quality training data is extremely laborious because dense, pixel-level ground truth is required and must be annotated by hand. In this paper, we present two techniques for significantly reducing the manual annotation effort involved in collecting large training datasets. The tools are designed to allow rapid annotation of entire videos collected by RGBD cameras, thus generating thousands of ground-truth frames to use for training. First, we propose a fully-automatic approach to produce dense pixel-level semantic segmentation maps. The technique uses noisy evidence from pre-trained object detectors and scene layout estimators and incorporates spatial and temporal context in a conditional random field formulation. Second, we propose a semi-automatic technique for dense annotation of 3d geometry, and in particular, the 3d poses of planes in indoor scenes. This technique requires a human to quickly annotate just a handful of keyframes per video, and then uses the camera poses and geometric reasoning to propagate these labels through an entire video sequence. Experimental results indicate that the technique could be used as an alternative or complementary source of training data, allowing large-scale data to be collected with minimal human effort.https://ieeexplore.ieee.org/document/9052727/Scene understanding3D reconstructionsemi-supervised learningcomputer vision
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Md Alimoor Reza Kai Chen Akshay Naik David J. Crandall Soon-Heung Jung
spellingShingle	Md Alimoor Reza Kai Chen Akshay Naik David J. Crandall Soon-Heung Jung Automatic Dense Annotation for Monocular 3D Scene Understanding IEEE Access Scene understanding 3D reconstruction semi-supervised learning computer vision
author_facet	Md Alimoor Reza Kai Chen Akshay Naik David J. Crandall Soon-Heung Jung
author_sort	Md Alimoor Reza
title	Automatic Dense Annotation for Monocular 3D Scene Understanding
title_short	Automatic Dense Annotation for Monocular 3D Scene Understanding
title_full	Automatic Dense Annotation for Monocular 3D Scene Understanding
title_fullStr	Automatic Dense Annotation for Monocular 3D Scene Understanding
title_full_unstemmed	Automatic Dense Annotation for Monocular 3D Scene Understanding
title_sort	automatic dense annotation for monocular 3d scene understanding
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Deep neural networks have revolutionized many areas of computer vision, but they require notoriously large amounts of labeled training data. For tasks such as semantic segmentation and monocular 3d scene layout estimation, collecting high-quality training data is extremely laborious because dense, pixel-level ground truth is required and must be annotated by hand. In this paper, we present two techniques for significantly reducing the manual annotation effort involved in collecting large training datasets. The tools are designed to allow rapid annotation of entire videos collected by RGBD cameras, thus generating thousands of ground-truth frames to use for training. First, we propose a fully-automatic approach to produce dense pixel-level semantic segmentation maps. The technique uses noisy evidence from pre-trained object detectors and scene layout estimators and incorporates spatial and temporal context in a conditional random field formulation. Second, we propose a semi-automatic technique for dense annotation of 3d geometry, and in particular, the 3d poses of planes in indoor scenes. This technique requires a human to quickly annotate just a handful of keyframes per video, and then uses the camera poses and geometric reasoning to propagate these labels through an entire video sequence. Experimental results indicate that the technique could be used as an alternative or complementary source of training data, allowing large-scale data to be collected with minimal human effort.
topic	Scene understanding 3D reconstruction semi-supervised learning computer vision
url	https://ieeexplore.ieee.org/document/9052727/
work_keys_str_mv	AT mdalimoorreza automaticdenseannotationformonocular3dsceneunderstanding AT kaichen automaticdenseannotationformonocular3dsceneunderstanding AT akshaynaik automaticdenseannotationformonocular3dsceneunderstanding AT davidjcrandall automaticdenseannotationformonocular3dsceneunderstanding AT soonheungjung automaticdenseannotationformonocular3dsceneunderstanding
_version_	1724186430246223872

Automatic Dense Annotation for Monocular 3D Scene Understanding

Similar Items