An Incremental Kernel Density Estimator for Data Stream Computation

Probability density function (p.d.f.) estimation plays a very important role in the field of data mining. Kernel density estimator (KDE) is the mostly used technology to estimate the unknown p.d.f. for the given dataset. The existing KDEs are usually inefficient when handling the p.d.f. estimation p...

Full description

Bibliographic Details
Main Authors: Yulin He, Jie Jiang, Dexin Dai, Klohoun Fabrice
Format: Article
Language:English
Published: Hindawi-Wiley 2020-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2020/1803525
id doaj-e91d304b805d4cb2b116f14a2040d831
record_format Article
spelling doaj-e91d304b805d4cb2b116f14a2040d8312020-11-25T03:01:00ZengHindawi-WileyComplexity1076-27871099-05262020-01-01202010.1155/2020/18035251803525An Incremental Kernel Density Estimator for Data Stream ComputationYulin He0Jie Jiang1Dexin Dai2Klohoun Fabrice3College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, ChinaProbability density function (p.d.f.) estimation plays a very important role in the field of data mining. Kernel density estimator (KDE) is the mostly used technology to estimate the unknown p.d.f. for the given dataset. The existing KDEs are usually inefficient when handling the p.d.f. estimation problem for stream data because a bran-new KDE has to be retrained based on the combination of current data and newly coming data. This process increases the training time and wastes the computation resource. This article proposes an incremental kernel density estimator (I-KDE) which deals with the p.d.f. estimation problem in the way of data stream computation. The I-KDE updates the current KDE dynamically and gradually with the newly coming data rather than retraining the bran-new KDE with the combination of current data and newly coming data. The theoretical analysis proves the convergence of the I-KDE only if the estimated p.d.f. of newly coming data is convergent to its true p.d.f. In order to guarantee the convergence of the I-KDE, a new multivariate fixed-point iteration algorithm based on the unbiased cross validation (UCV) method is developed to determine the optimal bandwidth of the KDE. The experimental results on 10 univariate and 4 multivariate probability distributions demonstrate the feasibility and effectiveness of the I-KDE.http://dx.doi.org/10.1155/2020/1803525
collection DOAJ
language English
format Article
sources DOAJ
author Yulin He
Jie Jiang
Dexin Dai
Klohoun Fabrice
spellingShingle Yulin He
Jie Jiang
Dexin Dai
Klohoun Fabrice
An Incremental Kernel Density Estimator for Data Stream Computation
Complexity
author_facet Yulin He
Jie Jiang
Dexin Dai
Klohoun Fabrice
author_sort Yulin He
title An Incremental Kernel Density Estimator for Data Stream Computation
title_short An Incremental Kernel Density Estimator for Data Stream Computation
title_full An Incremental Kernel Density Estimator for Data Stream Computation
title_fullStr An Incremental Kernel Density Estimator for Data Stream Computation
title_full_unstemmed An Incremental Kernel Density Estimator for Data Stream Computation
title_sort incremental kernel density estimator for data stream computation
publisher Hindawi-Wiley
series Complexity
issn 1076-2787
1099-0526
publishDate 2020-01-01
description Probability density function (p.d.f.) estimation plays a very important role in the field of data mining. Kernel density estimator (KDE) is the mostly used technology to estimate the unknown p.d.f. for the given dataset. The existing KDEs are usually inefficient when handling the p.d.f. estimation problem for stream data because a bran-new KDE has to be retrained based on the combination of current data and newly coming data. This process increases the training time and wastes the computation resource. This article proposes an incremental kernel density estimator (I-KDE) which deals with the p.d.f. estimation problem in the way of data stream computation. The I-KDE updates the current KDE dynamically and gradually with the newly coming data rather than retraining the bran-new KDE with the combination of current data and newly coming data. The theoretical analysis proves the convergence of the I-KDE only if the estimated p.d.f. of newly coming data is convergent to its true p.d.f. In order to guarantee the convergence of the I-KDE, a new multivariate fixed-point iteration algorithm based on the unbiased cross validation (UCV) method is developed to determine the optimal bandwidth of the KDE. The experimental results on 10 univariate and 4 multivariate probability distributions demonstrate the feasibility and effectiveness of the I-KDE.
url http://dx.doi.org/10.1155/2020/1803525
work_keys_str_mv AT yulinhe anincrementalkerneldensityestimatorfordatastreamcomputation
AT jiejiang anincrementalkerneldensityestimatorfordatastreamcomputation
AT dexindai anincrementalkerneldensityestimatorfordatastreamcomputation
AT klohounfabrice anincrementalkerneldensityestimatorfordatastreamcomputation
AT yulinhe incrementalkerneldensityestimatorfordatastreamcomputation
AT jiejiang incrementalkerneldensityestimatorfordatastreamcomputation
AT dexindai incrementalkerneldensityestimatorfordatastreamcomputation
AT klohounfabrice incrementalkerneldensityestimatorfordatastreamcomputation
_version_ 1715327656124219392