Detecting changes in high frequency data streams, with applications

In recent years, problems relating to the analysis of data streams have become widespread. A data stream is a collection of time ordered observations x1, x2, ... generated from the random variables X1, X2, .... It is assumed that the observations are univariate and independent, and that they arrive...

Full description

Bibliographic Details
Main Author: Ross, Gordon J.
Other Authors: Adams, Niall ; Tasoulis, Dimitrios
Published: Imperial College London 2013
Subjects:
510
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.587336
id ndltd-bl.uk-oai-ethos.bl.uk-587336
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-5873362017-06-27T03:23:31ZDetecting changes in high frequency data streams, with applicationsRoss, Gordon J.Adams, Niall ; Tasoulis, Dimitrios2013In recent years, problems relating to the analysis of data streams have become widespread. A data stream is a collection of time ordered observations x1, x2, ... generated from the random variables X1, X2, .... It is assumed that the observations are univariate and independent, and that they arrive in discrete time. Unlike traditional sequential analysis problems considered by statisticians, the size of a data stream is not assumed to be fixed, and new observations may be received over time. The rate at which these observations are received can be very high, perhaps several thousand every second. Therefore computational efficiency is very important, and methods used for analysis must be able to cope with potentially huge data sets. This paper is concerned with the task of detecting whether a data stream contains a change point, and extends traditional methods for sequential change detection to the streaming context. We focus on two different settings of the change point problem. The first is nonparametric change detection where, in contrast to most of the existing literature, we assume that nothing is known about either the pre- or post-change stream distribution. The task is then to detect a change from an unknown base distribution F0 to an unknown distribution F1. Further, we impose the constraint that change detection methods must have a bounded rate of false positives, which is important when it comes to assessing the significance of discovered change points. It is this constraint which makes the nonparametric problem difficult. We present several novel methods for this problem, and compare their performance via extensive experimental analysis. The second strand of our research is Bernoulli change detection, with application to streaming classification. In this setting, we assume a parametric form for the stream distribution, but one where both the pre- and post-change parameters are unknown. The task is again to detect changes, while having a control on the rate of false positives. After developing two different methods for tackling the pure Bernoulli change detection task, we then show how our approach can be deployed in streaming classification applications. Here, the goal is to classify objects into one of several categories. In the streaming case, the optimal classification rule can change over time, and classification techniques which are not able to adapt to these changes will suffer performance degradation. We show that by focusing only on the frequency of errors produced by the classifier, we can treat this as a Bernoulli change detection problem, and again perform extensive experimental analysis to show the value of our methods.510Imperial College Londonhttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.587336http://hdl.handle.net/10044/1/12255Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 510
spellingShingle 510
Ross, Gordon J.
Detecting changes in high frequency data streams, with applications
description In recent years, problems relating to the analysis of data streams have become widespread. A data stream is a collection of time ordered observations x1, x2, ... generated from the random variables X1, X2, .... It is assumed that the observations are univariate and independent, and that they arrive in discrete time. Unlike traditional sequential analysis problems considered by statisticians, the size of a data stream is not assumed to be fixed, and new observations may be received over time. The rate at which these observations are received can be very high, perhaps several thousand every second. Therefore computational efficiency is very important, and methods used for analysis must be able to cope with potentially huge data sets. This paper is concerned with the task of detecting whether a data stream contains a change point, and extends traditional methods for sequential change detection to the streaming context. We focus on two different settings of the change point problem. The first is nonparametric change detection where, in contrast to most of the existing literature, we assume that nothing is known about either the pre- or post-change stream distribution. The task is then to detect a change from an unknown base distribution F0 to an unknown distribution F1. Further, we impose the constraint that change detection methods must have a bounded rate of false positives, which is important when it comes to assessing the significance of discovered change points. It is this constraint which makes the nonparametric problem difficult. We present several novel methods for this problem, and compare their performance via extensive experimental analysis. The second strand of our research is Bernoulli change detection, with application to streaming classification. In this setting, we assume a parametric form for the stream distribution, but one where both the pre- and post-change parameters are unknown. The task is again to detect changes, while having a control on the rate of false positives. After developing two different methods for tackling the pure Bernoulli change detection task, we then show how our approach can be deployed in streaming classification applications. Here, the goal is to classify objects into one of several categories. In the streaming case, the optimal classification rule can change over time, and classification techniques which are not able to adapt to these changes will suffer performance degradation. We show that by focusing only on the frequency of errors produced by the classifier, we can treat this as a Bernoulli change detection problem, and again perform extensive experimental analysis to show the value of our methods.
author2 Adams, Niall ; Tasoulis, Dimitrios
author_facet Adams, Niall ; Tasoulis, Dimitrios
Ross, Gordon J.
author Ross, Gordon J.
author_sort Ross, Gordon J.
title Detecting changes in high frequency data streams, with applications
title_short Detecting changes in high frequency data streams, with applications
title_full Detecting changes in high frequency data streams, with applications
title_fullStr Detecting changes in high frequency data streams, with applications
title_full_unstemmed Detecting changes in high frequency data streams, with applications
title_sort detecting changes in high frequency data streams, with applications
publisher Imperial College London
publishDate 2013
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.587336
work_keys_str_mv AT rossgordonj detectingchangesinhighfrequencydatastreamswithapplications
_version_ 1718465522234818560