Data-Efficient Learning in Image Synthesis and Instance Segmentation

Modern deep learning methods have achieve remarkable performance on a variety of computer vision tasks, but frequently require large, well-balanced training datasets to achieve high-quality results. Data-efficient performance is critical for downstream tasks such as automated driving or facial recog...

Full description

Bibliographic Details
Main Author:	Robb, Esther Anne
Other Authors:	Electrical and Computer Engineering
Format:	Others
Published:	Virginia Tech 2021
Subjects:	Computer vision data-efficient learning
Online Access:	http://hdl.handle.net/10919/104676

id	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-104676
record_format	oai_dc
spelling	ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-1046762021-11-23T05:47:42Z Data-Efficient Learning in Image Synthesis and Instance Segmentation Robb, Esther Anne Electrical and Computer Engineering Huang, Jia-Bin Eldardiry, Hoda Jia, Ruoxi Computer vision data-efficient learning Modern deep learning methods have achieve remarkable performance on a variety of computer vision tasks, but frequently require large, well-balanced training datasets to achieve high-quality results. Data-efficient performance is critical for downstream tasks such as automated driving or facial recognition. We propose two methods of data-efficient learning for the tasks of image synthesis and instance segmentation. We first propose a method of high-quality and diverse image generation from finetuning to only 5-100 images. Our method factors a pretrained model into a small but highly expressive weight space for finetuning, which discourages overfitting in a small training set. We validate our method in a challenging few-shot setting of 5-100 images in the target domain. We show that our method has significant visual quality gains compared with existing GAN adaptation methods. Next, we introduce a simple adaptive instance segmentation loss which achieves state-of-the-art results on the LVIS dataset. We demonstrate that the rare categories are heavily suppressed by textit{correct background predictions}, which reduces the probability for all foreground categories with equal weight. Due to the relative infrequency of rare categories, this leads to an imbalance that biases towards predicting more frequent categories. Based on this insight, we develop DropLoss -- a novel adaptive loss to compensate for this imbalance without a trade-off between rare and frequent categories. Master of Science Many of the impressive results seen in modern computer vision rely on learning patterns from huge datasets of images, but these datasets may be expensive or difficult to collect. Many applications of computer vision need to learn from a very small number of examples, such as learning to recognize an unusual traffic event and behave safely in a self-driving car. In this thesis we propose two methods of learning from only a few examples. Our first method generates novel, high-quality and diverse images using a model fine-tuned on only 5-100 images. We start with an image generation model that was trained a much larger image set (70K images), and adapts it to a smaller image set (5-100 images). We selectively train only part of the network to encourage diversity and prevent memorization. Our second method focuses on the instance segmentation setting, where the model predicts (1) what objects occur in an image and (2) their exact outline in the image. This setting commonly suffers from long-tail distributions, where some of the known objects occur frequently (e.g. "human" may occur 1000+ times) but most only occur a few times (e.g. "cake" or "parrot" may only occur 10 times). We observed that the "background" label has a disproportionate effect of suppressing the rare object labels. We use this to develop a method to balance suppression from background classes during training. 2021-08-19T08:00:13Z 2021-08-19T08:00:13Z 2021-08-18 Thesis vt_gsexam:32073 http://hdl.handle.net/10919/104676 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf Virginia Tech
collection	NDLTD
format	Others
sources	NDLTD
topic	Computer vision data-efficient learning
spellingShingle	Computer vision data-efficient learning Robb, Esther Anne Data-Efficient Learning in Image Synthesis and Instance Segmentation
description	Modern deep learning methods have achieve remarkable performance on a variety of computer vision tasks, but frequently require large, well-balanced training datasets to achieve high-quality results. Data-efficient performance is critical for downstream tasks such as automated driving or facial recognition. We propose two methods of data-efficient learning for the tasks of image synthesis and instance segmentation. We first propose a method of high-quality and diverse image generation from finetuning to only 5-100 images. Our method factors a pretrained model into a small but highly expressive weight space for finetuning, which discourages overfitting in a small training set. We validate our method in a challenging few-shot setting of 5-100 images in the target domain. We show that our method has significant visual quality gains compared with existing GAN adaptation methods. Next, we introduce a simple adaptive instance segmentation loss which achieves state-of-the-art results on the LVIS dataset. We demonstrate that the rare categories are heavily suppressed by textit{correct background predictions}, which reduces the probability for all foreground categories with equal weight. Due to the relative infrequency of rare categories, this leads to an imbalance that biases towards predicting more frequent categories. Based on this insight, we develop DropLoss -- a novel adaptive loss to compensate for this imbalance without a trade-off between rare and frequent categories. === Master of Science === Many of the impressive results seen in modern computer vision rely on learning patterns from huge datasets of images, but these datasets may be expensive or difficult to collect. Many applications of computer vision need to learn from a very small number of examples, such as learning to recognize an unusual traffic event and behave safely in a self-driving car. In this thesis we propose two methods of learning from only a few examples. Our first method generates novel, high-quality and diverse images using a model fine-tuned on only 5-100 images. We start with an image generation model that was trained a much larger image set (70K images), and adapts it to a smaller image set (5-100 images). We selectively train only part of the network to encourage diversity and prevent memorization. Our second method focuses on the instance segmentation setting, where the model predicts (1) what objects occur in an image and (2) their exact outline in the image. This setting commonly suffers from long-tail distributions, where some of the known objects occur frequently (e.g. "human" may occur 1000+ times) but most only occur a few times (e.g. "cake" or "parrot" may only occur 10 times). We observed that the "background" label has a disproportionate effect of suppressing the rare object labels. We use this to develop a method to balance suppression from background classes during training.
author2	Electrical and Computer Engineering
author_facet	Electrical and Computer Engineering Robb, Esther Anne
author	Robb, Esther Anne
author_sort	Robb, Esther Anne
title	Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_short	Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_full	Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_fullStr	Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_full_unstemmed	Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_sort	data-efficient learning in image synthesis and instance segmentation
publisher	Virginia Tech
publishDate	2021
url	http://hdl.handle.net/10919/104676
work_keys_str_mv	AT robbestheranne dataefficientlearninginimagesynthesisandinstancesegmentation
_version_	1719495394302361600

Data-Efficient Learning in Image Synthesis and Instance Segmentation

Similar Items