A Comparative Study of Ensemble Active Learning

Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication...

Full description

Bibliographic Details
Main Author: Alabdulrahman, Rabaa
Other Authors: Viktor, Herna
Language:en
Published: Université d'Ottawa / University of Ottawa 2014
Subjects:
Online Access:http://hdl.handle.net/10393/31805
http://dx.doi.org/10.20381/ruor-6709
id ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-31805
record_format oai_dc
spelling ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-318052018-01-05T19:02:08Z A Comparative Study of Ensemble Active Learning Alabdulrahman, Rabaa Viktor, Herna Data Streams Ensemble Learning Active Learning Active Ensemble Learning Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication records. Such types of data bring new challenges to the data mining research community. Specifically, a number of researchers have developed techniques in order to build accurate classification models against such Data Streams. Ensemble Learning, where a number of so-called base classifiers are combined in order to build a model, has shown some promise. However, a number of challenges remain. Often, the class labels of the arriving data are incorrect or missing. Furthermore, Data Stream algorithms may benefit from an online learning paradigm, where a small amount of newly arriving data is used to learn incrementally. To this end, the use of Active Learning, where the user is in the loop, has been proposed as a way to extend Ensemble Learning. Here, the hypothesis is that Active Learning would increase the performance, in terms of accuracy, ensemble size, and the time it takes to build the model. This thesis tests the validity of this hypothesis. Namely, we explore whether augmenting Ensemble Learning with an Active Learning component benefits the Data Stream Learning process. Our analysis indicates that this hypothesis does not necessarily hold for the datasets under consideration. That is, the accuracies of Active Ensemble Learning are not statistically significantly higher than when using normal Ensemble Learning. Rather, Active Learning may even cause an increase in error rate. Further, Active Ensemble Learning actually results in an increase in the time taken to build the model. However, our results indicate that Active Ensemble Learning builds accurate models against much smaller ensemble sizes, when compared to the traditional Ensemble Learning algorithms. Further, the models we build are constructed against small and incrementally growing training sets, which may be very beneficial in a real time Data Stream setting. 2014-11-21T18:10:57Z 2014-11-21T18:10:57Z 2014 2014 Thesis http://hdl.handle.net/10393/31805 http://dx.doi.org/10.20381/ruor-6709 en Université d'Ottawa / University of Ottawa
collection NDLTD
language en
sources NDLTD
topic Data Streams
Ensemble Learning
Active Learning
Active Ensemble Learning
spellingShingle Data Streams
Ensemble Learning
Active Learning
Active Ensemble Learning
Alabdulrahman, Rabaa
A Comparative Study of Ensemble Active Learning
description Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication records. Such types of data bring new challenges to the data mining research community. Specifically, a number of researchers have developed techniques in order to build accurate classification models against such Data Streams. Ensemble Learning, where a number of so-called base classifiers are combined in order to build a model, has shown some promise. However, a number of challenges remain. Often, the class labels of the arriving data are incorrect or missing. Furthermore, Data Stream algorithms may benefit from an online learning paradigm, where a small amount of newly arriving data is used to learn incrementally. To this end, the use of Active Learning, where the user is in the loop, has been proposed as a way to extend Ensemble Learning. Here, the hypothesis is that Active Learning would increase the performance, in terms of accuracy, ensemble size, and the time it takes to build the model. This thesis tests the validity of this hypothesis. Namely, we explore whether augmenting Ensemble Learning with an Active Learning component benefits the Data Stream Learning process. Our analysis indicates that this hypothesis does not necessarily hold for the datasets under consideration. That is, the accuracies of Active Ensemble Learning are not statistically significantly higher than when using normal Ensemble Learning. Rather, Active Learning may even cause an increase in error rate. Further, Active Ensemble Learning actually results in an increase in the time taken to build the model. However, our results indicate that Active Ensemble Learning builds accurate models against much smaller ensemble sizes, when compared to the traditional Ensemble Learning algorithms. Further, the models we build are constructed against small and incrementally growing training sets, which may be very beneficial in a real time Data Stream setting.
author2 Viktor, Herna
author_facet Viktor, Herna
Alabdulrahman, Rabaa
author Alabdulrahman, Rabaa
author_sort Alabdulrahman, Rabaa
title A Comparative Study of Ensemble Active Learning
title_short A Comparative Study of Ensemble Active Learning
title_full A Comparative Study of Ensemble Active Learning
title_fullStr A Comparative Study of Ensemble Active Learning
title_full_unstemmed A Comparative Study of Ensemble Active Learning
title_sort comparative study of ensemble active learning
publisher Université d'Ottawa / University of Ottawa
publishDate 2014
url http://hdl.handle.net/10393/31805
http://dx.doi.org/10.20381/ruor-6709
work_keys_str_mv AT alabdulrahmanrabaa acomparativestudyofensembleactivelearning
AT alabdulrahmanrabaa comparativestudyofensembleactivelearning
_version_ 1718598171790147584