An Incremental Kernel Density Estimator for Data Stream Computation
Probability density function (p.d.f.) estimation plays a very important role in the field of data mining. Kernel density estimator (KDE) is the mostly used technology to estimate the unknown p.d.f. for the given dataset. The existing KDEs are usually inefficient when handling the p.d.f. estimation p...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi-Wiley
2020-01-01
|
Series: | Complexity |
Online Access: | http://dx.doi.org/10.1155/2020/1803525 |
id |
doaj-e91d304b805d4cb2b116f14a2040d831 |
---|---|
record_format |
Article |
spelling |
doaj-e91d304b805d4cb2b116f14a2040d8312020-11-25T03:01:00ZengHindawi-WileyComplexity1076-27871099-05262020-01-01202010.1155/2020/18035251803525An Incremental Kernel Density Estimator for Data Stream ComputationYulin He0Jie Jiang1Dexin Dai2Klohoun Fabrice3College of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, ChinaCollege of Computer Science & Software Engineering, Shenzhen University, Shenzhen 518060, ChinaProbability density function (p.d.f.) estimation plays a very important role in the field of data mining. Kernel density estimator (KDE) is the mostly used technology to estimate the unknown p.d.f. for the given dataset. The existing KDEs are usually inefficient when handling the p.d.f. estimation problem for stream data because a bran-new KDE has to be retrained based on the combination of current data and newly coming data. This process increases the training time and wastes the computation resource. This article proposes an incremental kernel density estimator (I-KDE) which deals with the p.d.f. estimation problem in the way of data stream computation. The I-KDE updates the current KDE dynamically and gradually with the newly coming data rather than retraining the bran-new KDE with the combination of current data and newly coming data. The theoretical analysis proves the convergence of the I-KDE only if the estimated p.d.f. of newly coming data is convergent to its true p.d.f. In order to guarantee the convergence of the I-KDE, a new multivariate fixed-point iteration algorithm based on the unbiased cross validation (UCV) method is developed to determine the optimal bandwidth of the KDE. The experimental results on 10 univariate and 4 multivariate probability distributions demonstrate the feasibility and effectiveness of the I-KDE.http://dx.doi.org/10.1155/2020/1803525 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Yulin He Jie Jiang Dexin Dai Klohoun Fabrice |
spellingShingle |
Yulin He Jie Jiang Dexin Dai Klohoun Fabrice An Incremental Kernel Density Estimator for Data Stream Computation Complexity |
author_facet |
Yulin He Jie Jiang Dexin Dai Klohoun Fabrice |
author_sort |
Yulin He |
title |
An Incremental Kernel Density Estimator for Data Stream Computation |
title_short |
An Incremental Kernel Density Estimator for Data Stream Computation |
title_full |
An Incremental Kernel Density Estimator for Data Stream Computation |
title_fullStr |
An Incremental Kernel Density Estimator for Data Stream Computation |
title_full_unstemmed |
An Incremental Kernel Density Estimator for Data Stream Computation |
title_sort |
incremental kernel density estimator for data stream computation |
publisher |
Hindawi-Wiley |
series |
Complexity |
issn |
1076-2787 1099-0526 |
publishDate |
2020-01-01 |
description |
Probability density function (p.d.f.) estimation plays a very important role in the field of data mining. Kernel density estimator (KDE) is the mostly used technology to estimate the unknown p.d.f. for the given dataset. The existing KDEs are usually inefficient when handling the p.d.f. estimation problem for stream data because a bran-new KDE has to be retrained based on the combination of current data and newly coming data. This process increases the training time and wastes the computation resource. This article proposes an incremental kernel density estimator (I-KDE) which deals with the p.d.f. estimation problem in the way of data stream computation. The I-KDE updates the current KDE dynamically and gradually with the newly coming data rather than retraining the bran-new KDE with the combination of current data and newly coming data. The theoretical analysis proves the convergence of the I-KDE only if the estimated p.d.f. of newly coming data is convergent to its true p.d.f. In order to guarantee the convergence of the I-KDE, a new multivariate fixed-point iteration algorithm based on the unbiased cross validation (UCV) method is developed to determine the optimal bandwidth of the KDE. The experimental results on 10 univariate and 4 multivariate probability distributions demonstrate the feasibility and effectiveness of the I-KDE. |
url |
http://dx.doi.org/10.1155/2020/1803525 |
work_keys_str_mv |
AT yulinhe anincrementalkerneldensityestimatorfordatastreamcomputation AT jiejiang anincrementalkerneldensityestimatorfordatastreamcomputation AT dexindai anincrementalkerneldensityestimatorfordatastreamcomputation AT klohounfabrice anincrementalkerneldensityestimatorfordatastreamcomputation AT yulinhe incrementalkerneldensityestimatorfordatastreamcomputation AT jiejiang incrementalkerneldensityestimatorfordatastreamcomputation AT dexindai incrementalkerneldensityestimatorfordatastreamcomputation AT klohounfabrice incrementalkerneldensityestimatorfordatastreamcomputation |
_version_ |
1715327656124219392 |