Mapping Algorithms for KNN Applications with Categorical Data

碩士 === 國立嘉義大學 === 資訊工程研究所 === 92 === In this paper, we present a novel method to transform data to speed up processing of high-dimensional K-nearest neighbor queries in index data environment. The transform method can prove that similarity in each attribute have really property, and searc...

Full description

Bibliographic Details
Main Authors: Yi-Sen Lin, 林奕森
Other Authors: 郭煌政
Format: Others
Language:zh-TW
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/26227870786117627780
Description
Summary:碩士 === 國立嘉義大學 === 資訊工程研究所 === 92 === In this paper, we present a novel method to transform data to speed up processing of high-dimensional K-nearest neighbor queries in index data environment. The transform method can prove that similarity in each attribute have really property, and search space more efficiently as the reduced dimensions. Memory-Based Reasoning is a useful data mining technique that deals with different attribute such as categorical or numeric values. In this paper, we present a novel method to map data to speed up processing of multi-dimensional K-nearest neighbor queries in index data environment. Due to MBR must calculate the target attribute value with all training dataset, it is very time consuming to obtain a result so we must to build the index framework. However, in the training dataset, the input attributes are categorical and numeric. Multi-dimensional index framework cannot handle categorical values well. So, we must convert categorical attribute into numeric. The mapping algorithm should preserve the distance relationship among categories of an attribute as much as possible. We use a real-life dataset for approximate K nearest neighbor searching. The experiment result shows that our algorithm has good accuracy.