A Study of Feature Selection and Pre-Sorting Strategy for COBWEB Algorithms

碩士 === 南台科技大學 === 資訊管理系 === 93 === This thesis studies hierarchical conceptual clustering for feature selection and search strategy. Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and control strateg...

Full description

Bibliographic Details
Main Authors: Jong Yu Chow, 周仲愚
Other Authors: Guang Yeh Tung
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/13626630496115554892
Description
Summary:碩士 === 南台科技大學 === 資訊管理系 === 93 === This thesis studies hierarchical conceptual clustering for feature selection and search strategy. Clustering is often used for discovering structure in data. Clustering systems differ in the objective function used to evaluate clustering quality and control strategy used to search the space of clustering. Ideally, the search strategy should consistently construct clusterings of high quality, but be computationally inexpensive as well. Hierarchical conceptual clustering by a system known as COBWEB. COBWEB, a conceptual clustering system that store knowledge in conceptual hierarchical , and uses an heuristic evaluation function called category utility that measures the quality for a set of probabilistic categories. In this thesis, we propose two strategies for COBWEB algorithm. Gini index is similar in form to the category utility, and based on standard squared-difference metric. It is used for feature selection and search strategy. More, Gini index and category utility are predicted class labels as probability. In particular, this thesis investigates the presorting strategy, it is more efficiency to reducing ordering effects on input data, furthermore, improve the clustering quality. Continue, being a computationally information gain to removes importantless feature subset, therefore, make lower time complexity in particular clustering. Ideally, these two strategies should consistently construct hierarchical conceptual clustering of high quality as well.