New insights on the power of active learning

Traditional supervised machine learning algorithms are expected to have access to a large corpus of labeled examples, but the massive amount of data available in the modern world has made unlabeled data much easier to acquire than accompanying labels. Active learning is an extension of the classical...

Full description

Bibliographic Details
Main Author: Berlind, Christopher
Other Authors: Balcan, Maria-Florina
Format: Others
Language:en_US
Published: Georgia Institute of Technology 2015
Subjects:
Online Access:http://hdl.handle.net/1853/53948
id ndltd-GATECH-oai-smartech.gatech.edu-1853-53948
record_format oai_dc
spelling ndltd-GATECH-oai-smartech.gatech.edu-1853-539482015-11-17T03:29:46ZNew insights on the power of active learningBerlind, ChristopherMachine learningLearning theoryActive learningSemi-supervised learningDomain adaptationLarge margin learningTraditional supervised machine learning algorithms are expected to have access to a large corpus of labeled examples, but the massive amount of data available in the modern world has made unlabeled data much easier to acquire than accompanying labels. Active learning is an extension of the classical paradigm intended to lessen the expense of the labeling process by allowing the learning algorithm to intelligently choose which examples should be labeled. In this dissertation, we demonstrate that the power to make adaptive label queries has benefits beyond reducing labeling effort over passive learning. We develop and explore several novel methods for active learning that exemplify these new capabilities. Some of these methods use active learning for a non-standard purpose, such as computational speedup, structure discovery, and domain adaptation. Others successfully apply active learning in situations where prior results have given evidence of its ineffectiveness. Specifically, we first give an active algorithm for learning disjunctions that is able to overcome a computational intractability present in the semi-supervised version of the same problem. This is the first known example of the computational advantages of active learning. Next, we investigate using active learning to determine structural properties (margins) of the data-generating distribution that can further improve learning rates. This is in contrast to most active learning algorithms which either assume or ignore structure rather than seeking to identify and exploit it. We then give an active nearest neighbors algorithm for domain adaptation, the task of learning a predictor for some target domain using mostly examples from a different source domain. This is the first formal analysis of the generalization and query behavior of an active domain adaptation algorithm. Finally, we show a situation where active learning can outperform passive learning on very noisy data, circumventing prior results that active learning cannot have a significant advantage over passive learning in high-noise regimes.Georgia Institute of TechnologyBalcan, Maria-FlorinaSong, Le2015-09-21T14:27:13Z2015-09-21T14:27:13Z2015-082015-07-22August 20152015-09-21T14:27:13ZDissertationapplication/pdfhttp://hdl.handle.net/1853/53948en_US
collection NDLTD
language en_US
format Others
sources NDLTD
topic Machine learning
Learning theory
Active learning
Semi-supervised learning
Domain adaptation
Large margin learning
spellingShingle Machine learning
Learning theory
Active learning
Semi-supervised learning
Domain adaptation
Large margin learning
Berlind, Christopher
New insights on the power of active learning
description Traditional supervised machine learning algorithms are expected to have access to a large corpus of labeled examples, but the massive amount of data available in the modern world has made unlabeled data much easier to acquire than accompanying labels. Active learning is an extension of the classical paradigm intended to lessen the expense of the labeling process by allowing the learning algorithm to intelligently choose which examples should be labeled. In this dissertation, we demonstrate that the power to make adaptive label queries has benefits beyond reducing labeling effort over passive learning. We develop and explore several novel methods for active learning that exemplify these new capabilities. Some of these methods use active learning for a non-standard purpose, such as computational speedup, structure discovery, and domain adaptation. Others successfully apply active learning in situations where prior results have given evidence of its ineffectiveness. Specifically, we first give an active algorithm for learning disjunctions that is able to overcome a computational intractability present in the semi-supervised version of the same problem. This is the first known example of the computational advantages of active learning. Next, we investigate using active learning to determine structural properties (margins) of the data-generating distribution that can further improve learning rates. This is in contrast to most active learning algorithms which either assume or ignore structure rather than seeking to identify and exploit it. We then give an active nearest neighbors algorithm for domain adaptation, the task of learning a predictor for some target domain using mostly examples from a different source domain. This is the first formal analysis of the generalization and query behavior of an active domain adaptation algorithm. Finally, we show a situation where active learning can outperform passive learning on very noisy data, circumventing prior results that active learning cannot have a significant advantage over passive learning in high-noise regimes.
author2 Balcan, Maria-Florina
author_facet Balcan, Maria-Florina
Berlind, Christopher
author Berlind, Christopher
author_sort Berlind, Christopher
title New insights on the power of active learning
title_short New insights on the power of active learning
title_full New insights on the power of active learning
title_fullStr New insights on the power of active learning
title_full_unstemmed New insights on the power of active learning
title_sort new insights on the power of active learning
publisher Georgia Institute of Technology
publishDate 2015
url http://hdl.handle.net/1853/53948
work_keys_str_mv AT berlindchristopher newinsightsonthepowerofactivelearning
_version_ 1718130943921750016