Classification Active Learning Based on Mutual Information

Selecting a subset of samples to label from a large pool of unlabeled data points, such that a sufficiently accurate classifier is obtained using a reasonably small training set is a challenging, yet critical problem. Challenging, since solving this problem includes cumbersome combinatorial computat...

Full description

Bibliographic Details
Main Authors:	Jamshid Sourati, Murat Akcakaya, Jennifer G. Dy, Todd K. Leen, Deniz Erdogmus
Format:	Article
Language:	English
Published:	MDPI AG 2016-02-01
Series:	Entropy
Subjects:	active learning mutual information submodular maximization classification
Online Access:	http://www.mdpi.com/1099-4300/18/2/51

id	doaj-b0bf9bf5494740a4a07208e8b97865b0
record_format	Article
spelling	doaj-b0bf9bf5494740a4a07208e8b97865b02020-11-25T01:09:31ZengMDPI AGEntropy1099-43002016-02-011825110.3390/e18020051e18020051Classification Active Learning Based on Mutual InformationJamshid Sourati0Murat Akcakaya1Jennifer G. Dy2Todd K. Leen3Deniz Erdogmus4Department of Electrical and Computer Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USADepartment of Electrical and Computer Engineering, University of Pittsburgh, 3700 O’Hara Street, Pittsburgh, PA 15261, USADepartment of Electrical and Computer Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USANational Science Foundation, 4201 Wilson Boulevard, Arlington, VA 22230, USADepartment of Electrical and Computer Engineering, Northeastern University, 360 Huntington Ave, Boston, MA 02115, USASelecting a subset of samples to label from a large pool of unlabeled data points, such that a sufficiently accurate classifier is obtained using a reasonably small training set is a challenging, yet critical problem. Challenging, since solving this problem includes cumbersome combinatorial computations, and critical, due to the fact that labeling is an expensive and time-consuming task, hence we always aim to minimize the number of required labels. While information theoretical objectives, such as mutual information (MI) between the labels, have been successfully used in sequential querying, it is not straightforward to generalize these objectives to batch mode. This is because evaluation and optimization of functions which are trivial in individual querying settings become intractable for many objectives when we are to select multiple queries. In this paper, we develop a framework, where we propose efficient ways of evaluating and maximizing the MI between labels as an objective for batch mode active learning. Our proposed framework efficiently reduces the computational complexity from an order proportional to the batch size, when no approximation is applied, to the linear cost. The performance of this framework is evaluated using data sets from several fields showing that the proposed framework leads to efficient active learning for most of the data sets.http://www.mdpi.com/1099-4300/18/2/51active learningmutual informationsubmodular maximizationclassification
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Jamshid Sourati Murat Akcakaya Jennifer G. Dy Todd K. Leen Deniz Erdogmus
spellingShingle	Jamshid Sourati Murat Akcakaya Jennifer G. Dy Todd K. Leen Deniz Erdogmus Classification Active Learning Based on Mutual Information Entropy active learning mutual information submodular maximization classification
author_facet	Jamshid Sourati Murat Akcakaya Jennifer G. Dy Todd K. Leen Deniz Erdogmus
author_sort	Jamshid Sourati
title	Classification Active Learning Based on Mutual Information
title_short	Classification Active Learning Based on Mutual Information
title_full	Classification Active Learning Based on Mutual Information
title_fullStr	Classification Active Learning Based on Mutual Information
title_full_unstemmed	Classification Active Learning Based on Mutual Information
title_sort	classification active learning based on mutual information
publisher	MDPI AG
series	Entropy
issn	1099-4300
publishDate	2016-02-01
description	Selecting a subset of samples to label from a large pool of unlabeled data points, such that a sufficiently accurate classifier is obtained using a reasonably small training set is a challenging, yet critical problem. Challenging, since solving this problem includes cumbersome combinatorial computations, and critical, due to the fact that labeling is an expensive and time-consuming task, hence we always aim to minimize the number of required labels. While information theoretical objectives, such as mutual information (MI) between the labels, have been successfully used in sequential querying, it is not straightforward to generalize these objectives to batch mode. This is because evaluation and optimization of functions which are trivial in individual querying settings become intractable for many objectives when we are to select multiple queries. In this paper, we develop a framework, where we propose efficient ways of evaluating and maximizing the MI between labels as an objective for batch mode active learning. Our proposed framework efficiently reduces the computational complexity from an order proportional to the batch size, when no approximation is applied, to the linear cost. The performance of this framework is evaluated using data sets from several fields showing that the proposed framework leads to efficient active learning for most of the data sets.
topic	active learning mutual information submodular maximization classification
url	http://www.mdpi.com/1099-4300/18/2/51
work_keys_str_mv	AT jamshidsourati classificationactivelearningbasedonmutualinformation AT muratakcakaya classificationactivelearningbasedonmutualinformation AT jennifergdy classificationactivelearningbasedonmutualinformation AT toddkleen classificationactivelearningbasedonmutualinformation AT denizerdogmus classificationactivelearningbasedonmutualinformation
_version_	1725178269152575488

Classification Active Learning Based on Mutual Information

Similar Items