Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging

In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection an...

Full description

Bibliographic Details
Main Author: Olorunnimbe, Muhammed
Other Authors: Viktor, Herna
Language:en
Published: Université d'Ottawa / University of Ottawa 2015
Subjects:
ROI
Online Access:http://hdl.handle.net/10393/32340
http://dx.doi.org/10.20381/ruor-4304
id ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-32340
record_format oai_dc
spelling ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-323402018-01-05T19:02:19Z Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging Olorunnimbe, Muhammed Viktor, Herna Data stream Concept drift Metalearning Cost sensitive adaptation ROI Utility Adaptive ensemble size Online bagging In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal. 2015-05-13T13:19:02Z 2015-05-13T13:19:02Z 2015 2015 Thesis http://hdl.handle.net/10393/32340 http://dx.doi.org/10.20381/ruor-4304 en Université d'Ottawa / University of Ottawa
collection NDLTD
language en
sources NDLTD
topic Data stream
Concept drift
Metalearning
Cost sensitive adaptation
ROI
Utility
Adaptive ensemble size
Online bagging
spellingShingle Data stream
Concept drift
Metalearning
Cost sensitive adaptation
ROI
Utility
Adaptive ensemble size
Online bagging
Olorunnimbe, Muhammed
Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
description In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal.
author2 Viktor, Herna
author_facet Viktor, Herna
Olorunnimbe, Muhammed
author Olorunnimbe, Muhammed
author_sort Olorunnimbe, Muhammed
title Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_short Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_full Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_fullStr Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_full_unstemmed Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_sort intelligent adaptation of ensemble size in data streams using online bagging
publisher Université d'Ottawa / University of Ottawa
publishDate 2015
url http://hdl.handle.net/10393/32340
http://dx.doi.org/10.20381/ruor-4304
work_keys_str_mv AT olorunnimbemuhammed intelligentadaptationofensemblesizeindatastreamsusingonlinebagging
_version_ 1718598293791965184