A Comparative Study of Ensemble Active Learning
Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication...
Main Author: | |
---|---|
Other Authors: | |
Language: | en |
Published: |
Université d'Ottawa / University of Ottawa
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10393/31805 http://dx.doi.org/10.20381/ruor-6709 |
id |
ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-31805 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-318052018-01-05T19:02:08Z A Comparative Study of Ensemble Active Learning Alabdulrahman, Rabaa Viktor, Herna Data Streams Ensemble Learning Active Learning Active Ensemble Learning Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication records. Such types of data bring new challenges to the data mining research community. Specifically, a number of researchers have developed techniques in order to build accurate classification models against such Data Streams. Ensemble Learning, where a number of so-called base classifiers are combined in order to build a model, has shown some promise. However, a number of challenges remain. Often, the class labels of the arriving data are incorrect or missing. Furthermore, Data Stream algorithms may benefit from an online learning paradigm, where a small amount of newly arriving data is used to learn incrementally. To this end, the use of Active Learning, where the user is in the loop, has been proposed as a way to extend Ensemble Learning. Here, the hypothesis is that Active Learning would increase the performance, in terms of accuracy, ensemble size, and the time it takes to build the model. This thesis tests the validity of this hypothesis. Namely, we explore whether augmenting Ensemble Learning with an Active Learning component benefits the Data Stream Learning process. Our analysis indicates that this hypothesis does not necessarily hold for the datasets under consideration. That is, the accuracies of Active Ensemble Learning are not statistically significantly higher than when using normal Ensemble Learning. Rather, Active Learning may even cause an increase in error rate. Further, Active Ensemble Learning actually results in an increase in the time taken to build the model. However, our results indicate that Active Ensemble Learning builds accurate models against much smaller ensemble sizes, when compared to the traditional Ensemble Learning algorithms. Further, the models we build are constructed against small and incrementally growing training sets, which may be very beneficial in a real time Data Stream setting. 2014-11-21T18:10:57Z 2014-11-21T18:10:57Z 2014 2014 Thesis http://hdl.handle.net/10393/31805 http://dx.doi.org/10.20381/ruor-6709 en Université d'Ottawa / University of Ottawa |
collection |
NDLTD |
language |
en |
sources |
NDLTD |
topic |
Data Streams Ensemble Learning Active Learning Active Ensemble Learning |
spellingShingle |
Data Streams Ensemble Learning Active Learning Active Ensemble Learning Alabdulrahman, Rabaa A Comparative Study of Ensemble Active Learning |
description |
Data Stream mining is an important emerging topic in the data mining and machine learning domain. In a Data Stream setting, the data arrive continuously and often at a fast pace. Examples include credit cards transaction records, surveillances video streams, network event logs, and telecommunication records. Such types of data bring new challenges to the data mining research community. Specifically, a number of researchers have developed techniques in order to build accurate classification models against such Data Streams. Ensemble Learning, where a number of so-called base classifiers are combined in order to build a model, has shown some promise. However, a number of challenges remain. Often, the class labels of the arriving data are incorrect or missing. Furthermore, Data Stream algorithms may benefit from an online learning paradigm, where a small amount of newly arriving data is used to learn incrementally. To this end, the use of Active Learning, where the user is in the loop, has been proposed as a way to extend Ensemble Learning. Here, the hypothesis is that Active Learning would increase the performance, in terms of accuracy, ensemble size, and the time it takes to build the model.
This thesis tests the validity of this hypothesis. Namely, we explore whether augmenting Ensemble Learning with an Active Learning component benefits the Data Stream Learning process. Our analysis indicates that this hypothesis does not necessarily hold for the datasets under consideration. That is, the accuracies of Active Ensemble Learning are not statistically significantly higher than when using normal Ensemble Learning. Rather, Active Learning may even cause an increase in error rate. Further, Active Ensemble Learning actually results in an increase in the time taken to build the model. However, our results indicate that Active Ensemble Learning builds accurate models against much smaller ensemble sizes, when compared to the traditional Ensemble Learning algorithms. Further, the models we build are constructed against small and incrementally growing training sets, which may be very beneficial in a real time Data Stream setting. |
author2 |
Viktor, Herna |
author_facet |
Viktor, Herna Alabdulrahman, Rabaa |
author |
Alabdulrahman, Rabaa |
author_sort |
Alabdulrahman, Rabaa |
title |
A Comparative Study of Ensemble Active Learning |
title_short |
A Comparative Study of Ensemble Active Learning |
title_full |
A Comparative Study of Ensemble Active Learning |
title_fullStr |
A Comparative Study of Ensemble Active Learning |
title_full_unstemmed |
A Comparative Study of Ensemble Active Learning |
title_sort |
comparative study of ensemble active learning |
publisher |
Université d'Ottawa / University of Ottawa |
publishDate |
2014 |
url |
http://hdl.handle.net/10393/31805 http://dx.doi.org/10.20381/ruor-6709 |
work_keys_str_mv |
AT alabdulrahmanrabaa acomparativestudyofensembleactivelearning AT alabdulrahmanrabaa comparativestudyofensembleactivelearning |
_version_ |
1718598171790147584 |