Summary: | 碩士 === 國立雲林科技大學 === 資訊管理系碩士班 === 92 === Data mining is a process of nontrivial extraction of implicit, previously unknown and potentially useful information from data. Using mining techniques we can extract information as well as forecast trends. The data for mining can be generally classified as categorical, numerical and the other type of data. Traditionally, we manipulate numeri-cal data by using geometry distance, and manipulate categorical data by using simple matching coefficient (SMC) or binary simple matching coefficienct (BSMC). However, it just simply compare the data in a direct manner, and it is impractical in reality. This study proposes a distance measure based on hierarchical trees and measures the value distance through concept hierarchy trees that can both represent numeric data and cate-gorical data in a unified way as well as better depict the relationship among categorical values.. In this study, we used Single Link, Complete Link and Average Link incorpo-rated with SMC, BSMC and Distance Hierarchy approaches, respectively, for clustering hybrid data, and also compare with the K-Prototype algorithm. We conduct extensive experiments on different data sets with various parameter settings. The results show that our algorithm can effectively improve the clustering quality of mixed data and obtain the results which better represent the characteristic of original data.
|