Density Estimation in High Dimensions Using Distance to K Nearest Neighbors

博士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === The study of density estimation has produced algorithms that has been used across many disciplines and has become a common fixture in the analysis of data. However density estimation has not been able to perform well on high-dimensional datasets. In this study,...

Full description

Bibliographic Details
Main Authors: Lih-ching Chou, 周立晴
Other Authors: 歐陽彥正
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/8zc2k4
Description
Summary:博士 === 國立臺灣大學 === 資訊工程學研究所 === 106 === The study of density estimation has produced algorithms that has been used across many disciplines and has become a common fixture in the analysis of data. However density estimation has not been able to perform well on high-dimensional datasets. In this study, we discuss the reasons that traditional density estimation would not work well for high dimensional data. Why they give values that are uninterpretable, with either the values so low that the values may be greatly affected by the model noise or computational noise, or the values are so high where we cannot compute the ratio of infinity over infinity. This study proposes using negative log distance to k nearest neighbors as the metric to compare when the dimension of the samples are not known. The resulting classifier, HDDE, was used to classify images in domains with close to 100k dimensions with reasonable results.