Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging

In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection an...

Full description

Bibliographic Details
Main Author:	Olorunnimbe, Muhammed
Other Authors:	Viktor, Herna
Language:	en
Published:	Université d'Ottawa / University of Ottawa 2015
Subjects:	Data stream Concept drift Metalearning Cost sensitive adaptation ROI Utility Adaptive ensemble size Online bagging
Online Access:	http://hdl.handle.net/10393/32340 http://dx.doi.org/10.20381/ruor-4304

id	ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-32340
record_format	oai_dc
spelling	ndltd-uottawa.ca-oai-ruor.uottawa.ca-10393-323402018-01-05T19:02:19Z Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging Olorunnimbe, Muhammed Viktor, Herna Data stream Concept drift Metalearning Cost sensitive adaptation ROI Utility Adaptive ensemble size Online bagging In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal. 2015-05-13T13:19:02Z 2015-05-13T13:19:02Z 2015 2015 Thesis http://hdl.handle.net/10393/32340 http://dx.doi.org/10.20381/ruor-4304 en Université d'Ottawa / University of Ottawa
collection	NDLTD
language	en
sources	NDLTD
topic	Data stream Concept drift Metalearning Cost sensitive adaptation ROI Utility Adaptive ensemble size Online bagging
spellingShingle	Data stream Concept drift Metalearning Cost sensitive adaptation ROI Utility Adaptive ensemble size Online bagging Olorunnimbe, Muhammed Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
description	In this era of the Internet of Things and Big Data, a proliferation of connected devices continuously produce massive amounts of fast evolving streaming data. There is a need to study the relationships in such streams for analytic applications, such as network intrusion detection, fraud detection and financial forecasting, amongst other. In this setting, it is crucial to create data mining algorithms that are able to seamlessly adapt to temporal changes in data characteristics that occur in data streams. These changes are called concept drifts. The resultant models produced by such algorithms should not only be highly accurate and be able to swiftly adapt to changes. Rather, the data mining techniques should also be fast, scalable, and efficient in terms of resource allocation. It then becomes important to consider issues such as storage space needs and memory utilization. This is especially relevant when we aim to build personalized, near-instant models in a Big Data setting. This research work focuses on mining in a data stream with concept drift, using an online bagging method, with consideration to the memory utilization. Our aim is to take an adaptive approach to resource allocation during the mining process. Specifically, we consider metalearning, where the models of multiple classifiers are combined into an ensemble, has been very successful when building accurate models against data streams. However, little work has been done to explore the interplay between accuracy, efficiency and utility. This research focuses on this issue. We introduce an adaptive metalearning algorithm that takes advantage of the memory utilization cost of concept drift, in order to vary the ensemble size during the data mining process. We aim to minimize the memory usage, while maintaining highly accurate models with a high utility. We evaluated our method against a number of benchmarking datasets and compare our results against the state-of-the art. Return on Investment (ROI) was used to evaluate the gain in performance in terms of accuracy, in contrast to the time and memory invested. We aimed to achieve high ROI without compromising on the accuracy of the result. Our experimental results indicate that we achieved this goal.
author2	Viktor, Herna
author_facet	Viktor, Herna Olorunnimbe, Muhammed
author	Olorunnimbe, Muhammed
author_sort	Olorunnimbe, Muhammed
title	Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_short	Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_full	Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_fullStr	Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_full_unstemmed	Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging
title_sort	intelligent adaptation of ensemble size in data streams using online bagging
publisher	Université d'Ottawa / University of Ottawa
publishDate	2015
url	http://hdl.handle.net/10393/32340 http://dx.doi.org/10.20381/ruor-4304
work_keys_str_mv	AT olorunnimbemuhammed intelligentadaptationofensemblesizeindatastreamsusingonlinebagging
_version_	1718598293791965184

Intelligent Adaptation of Ensemble Size in Data Streams Using Online Bagging

Similar Items