Adaptive kernel density estimation

The need for improvements over the fixed kernel density estimator in certain situations has been discussed extensively in the literature, particularly in the application of density estimation to mode hunting. Problem densities often exhibit skewness or multimodality with differences in scale for eac...

Full description

Bibliographic Details
Main Author: Sain, Stephan R.
Other Authors: Scott, David W.
Format: Others
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/1911/16743
Description
Summary:The need for improvements over the fixed kernel density estimator in certain situations has been discussed extensively in the literature, particularly in the application of density estimation to mode hunting. Problem densities often exhibit skewness or multimodality with differences in scale for each mode. By varying the bandwidth in some fashion, it is possible to achieve significant improvements over the fixed bandwidth approach. In general, variable bandwidth kernel density estimators can be divided into two categories: those that vary the bandwidth with the estimation point (balloon estimators) and those that vary the bandwidth with each data point (sample point estimators). For univariate balloon estimators, it can be shown that there exists a bandwidth in regions of f where f is convex (e.g. the tails) such that the bias is exactly zero. Such a bandwidth leads to a MSE = $O(n\sp{-1})$ for points in the appropriate regions. A global implementation strategy using a local cross-validation algorithm to estimate such bandwidths is developed. The theoretical behavior of the sample point estimator is difficult to examine as the form of the bandwidth function is unknown. An approximation based on binning the data is used to study the behavior of the MISE and the optimal bandwidth function. A practical data-based procedure for determining bandwidths for the sample point estimator is developed using a spline function to estimate the unknown bandwidth function. Finally, the multivariate problem is briefly addressed by examining the shape and size of the optimal bivariate kernels suggested by Terrell and Scott (1992). Extensions of the binning and spline estimation ideas are also discussed.