Data-Efficient Learning in Image Synthesis and Instance Segmentation

Modern deep learning methods have achieve remarkable performance on a variety of computer vision tasks, but frequently require large, well-balanced training datasets to achieve high-quality results. Data-efficient performance is critical for downstream tasks such as automated driving or facial recog...

Full description

Bibliographic Details
Main Author: Robb, Esther Anne
Other Authors: Electrical and Computer Engineering
Format: Others
Published: Virginia Tech 2021
Subjects:
Online Access:http://hdl.handle.net/10919/104676
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-104676
record_format oai_dc
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-1046762021-11-23T05:47:42Z Data-Efficient Learning in Image Synthesis and Instance Segmentation Robb, Esther Anne Electrical and Computer Engineering Huang, Jia-Bin Eldardiry, Hoda Jia, Ruoxi Computer vision data-efficient learning Modern deep learning methods have achieve remarkable performance on a variety of computer vision tasks, but frequently require large, well-balanced training datasets to achieve high-quality results. Data-efficient performance is critical for downstream tasks such as automated driving or facial recognition. We propose two methods of data-efficient learning for the tasks of image synthesis and instance segmentation. We first propose a method of high-quality and diverse image generation from finetuning to only 5-100 images. Our method factors a pretrained model into a small but highly expressive weight space for finetuning, which discourages overfitting in a small training set. We validate our method in a challenging few-shot setting of 5-100 images in the target domain. We show that our method has significant visual quality gains compared with existing GAN adaptation methods. Next, we introduce a simple adaptive instance segmentation loss which achieves state-of-the-art results on the LVIS dataset. We demonstrate that the rare categories are heavily suppressed by textit{correct background predictions}, which reduces the probability for all foreground categories with equal weight. Due to the relative infrequency of rare categories, this leads to an imbalance that biases towards predicting more frequent categories. Based on this insight, we develop DropLoss -- a novel adaptive loss to compensate for this imbalance without a trade-off between rare and frequent categories. Master of Science Many of the impressive results seen in modern computer vision rely on learning patterns from huge datasets of images, but these datasets may be expensive or difficult to collect. Many applications of computer vision need to learn from a very small number of examples, such as learning to recognize an unusual traffic event and behave safely in a self-driving car. In this thesis we propose two methods of learning from only a few examples. Our first method generates novel, high-quality and diverse images using a model fine-tuned on only 5-100 images. We start with an image generation model that was trained a much larger image set (70K images), and adapts it to a smaller image set (5-100 images). We selectively train only part of the network to encourage diversity and prevent memorization. Our second method focuses on the instance segmentation setting, where the model predicts (1) what objects occur in an image and (2) their exact outline in the image. This setting commonly suffers from long-tail distributions, where some of the known objects occur frequently (e.g. "human" may occur 1000+ times) but most only occur a few times (e.g. "cake" or "parrot" may only occur 10 times). We observed that the "background" label has a disproportionate effect of suppressing the rare object labels. We use this to develop a method to balance suppression from background classes during training. 2021-08-19T08:00:13Z 2021-08-19T08:00:13Z 2021-08-18 Thesis vt_gsexam:32073 http://hdl.handle.net/10919/104676 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf Virginia Tech
collection NDLTD
format Others
sources NDLTD
topic Computer vision
data-efficient learning
spellingShingle Computer vision
data-efficient learning
Robb, Esther Anne
Data-Efficient Learning in Image Synthesis and Instance Segmentation
description Modern deep learning methods have achieve remarkable performance on a variety of computer vision tasks, but frequently require large, well-balanced training datasets to achieve high-quality results. Data-efficient performance is critical for downstream tasks such as automated driving or facial recognition. We propose two methods of data-efficient learning for the tasks of image synthesis and instance segmentation. We first propose a method of high-quality and diverse image generation from finetuning to only 5-100 images. Our method factors a pretrained model into a small but highly expressive weight space for finetuning, which discourages overfitting in a small training set. We validate our method in a challenging few-shot setting of 5-100 images in the target domain. We show that our method has significant visual quality gains compared with existing GAN adaptation methods. Next, we introduce a simple adaptive instance segmentation loss which achieves state-of-the-art results on the LVIS dataset. We demonstrate that the rare categories are heavily suppressed by textit{correct background predictions}, which reduces the probability for all foreground categories with equal weight. Due to the relative infrequency of rare categories, this leads to an imbalance that biases towards predicting more frequent categories. Based on this insight, we develop DropLoss -- a novel adaptive loss to compensate for this imbalance without a trade-off between rare and frequent categories. === Master of Science === Many of the impressive results seen in modern computer vision rely on learning patterns from huge datasets of images, but these datasets may be expensive or difficult to collect. Many applications of computer vision need to learn from a very small number of examples, such as learning to recognize an unusual traffic event and behave safely in a self-driving car. In this thesis we propose two methods of learning from only a few examples. Our first method generates novel, high-quality and diverse images using a model fine-tuned on only 5-100 images. We start with an image generation model that was trained a much larger image set (70K images), and adapts it to a smaller image set (5-100 images). We selectively train only part of the network to encourage diversity and prevent memorization. Our second method focuses on the instance segmentation setting, where the model predicts (1) what objects occur in an image and (2) their exact outline in the image. This setting commonly suffers from long-tail distributions, where some of the known objects occur frequently (e.g. "human" may occur 1000+ times) but most only occur a few times (e.g. "cake" or "parrot" may only occur 10 times). We observed that the "background" label has a disproportionate effect of suppressing the rare object labels. We use this to develop a method to balance suppression from background classes during training.
author2 Electrical and Computer Engineering
author_facet Electrical and Computer Engineering
Robb, Esther Anne
author Robb, Esther Anne
author_sort Robb, Esther Anne
title Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_short Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_full Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_fullStr Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_full_unstemmed Data-Efficient Learning in Image Synthesis and Instance Segmentation
title_sort data-efficient learning in image synthesis and instance segmentation
publisher Virginia Tech
publishDate 2021
url http://hdl.handle.net/10919/104676
work_keys_str_mv AT robbestheranne dataefficientlearninginimagesynthesisandinstancesegmentation
_version_ 1719495394302361600