A parallel data stream classification technique for high velocity data streams

Real-time classification of data streams remains one of the most challenging aspects of Big Data. As a data stream is an unending source of information, classification models and metrics must be created and adapted in real-time as the data is made available to them. This time constrained learning is...

Full description

Bibliographic Details
Main Author: Tennant, Mark
Published: University of Reading 2018
Online Access:https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.749369
id ndltd-bl.uk-oai-ethos.bl.uk-749369
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-7493692019-01-08T03:36:59ZA parallel data stream classification technique for high velocity data streamsTennant, Mark2018Real-time classification of data streams remains one of the most challenging aspects of Big Data. As a data stream is an unending source of information, classification models and metrics must be created and adapted in real-time as the data is made available to them. This time constrained learning is problematic, conventional data models require a training period to examine the data and produce models for evaluation. In data stream mining this training period does not exist, instead the models are continuously updated in real-time. As data streams become faster and larger the quantity of data to be processed can overwhelm a single machines’ learning capabilities. One method to reduce the work load upon a data mining algorithm is to implement parallel solutions. This has the benefit of distributing the classification over one or more machines. Unfortunately, most parallel implementations of classification algorithms are not suitable for real-time processing, and most data stream mining algorithms are not suitable for parallelisation. This research develops on real-time parallel classification of data instances with respect to vast amounts of data. The proposed solution is vastly scalable as it incurs no additional communications costs when training. Moreover, it is capable of accepting data streams that contain multiple sources. The newly created algorithm Parallel MC-NN has been implemented and evaluated on open source parallel technologies. The results of experimentation show a scalable solution that has been evaluated and peer reviewed via multiple publications.University of Readinghttps://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.749369http://centaur.reading.ac.uk/77919/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
description Real-time classification of data streams remains one of the most challenging aspects of Big Data. As a data stream is an unending source of information, classification models and metrics must be created and adapted in real-time as the data is made available to them. This time constrained learning is problematic, conventional data models require a training period to examine the data and produce models for evaluation. In data stream mining this training period does not exist, instead the models are continuously updated in real-time. As data streams become faster and larger the quantity of data to be processed can overwhelm a single machines’ learning capabilities. One method to reduce the work load upon a data mining algorithm is to implement parallel solutions. This has the benefit of distributing the classification over one or more machines. Unfortunately, most parallel implementations of classification algorithms are not suitable for real-time processing, and most data stream mining algorithms are not suitable for parallelisation. This research develops on real-time parallel classification of data instances with respect to vast amounts of data. The proposed solution is vastly scalable as it incurs no additional communications costs when training. Moreover, it is capable of accepting data streams that contain multiple sources. The newly created algorithm Parallel MC-NN has been implemented and evaluated on open source parallel technologies. The results of experimentation show a scalable solution that has been evaluated and peer reviewed via multiple publications.
author Tennant, Mark
spellingShingle Tennant, Mark
A parallel data stream classification technique for high velocity data streams
author_facet Tennant, Mark
author_sort Tennant, Mark
title A parallel data stream classification technique for high velocity data streams
title_short A parallel data stream classification technique for high velocity data streams
title_full A parallel data stream classification technique for high velocity data streams
title_fullStr A parallel data stream classification technique for high velocity data streams
title_full_unstemmed A parallel data stream classification technique for high velocity data streams
title_sort parallel data stream classification technique for high velocity data streams
publisher University of Reading
publishDate 2018
url https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.749369
work_keys_str_mv AT tennantmark aparalleldatastreamclassificationtechniqueforhighvelocitydatastreams
AT tennantmark paralleldatastreamclassificationtechniqueforhighvelocitydatastreams
_version_ 1718808843415191552