Transactional Data Set Clustering based on a Statistical Measure

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 89 === This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to...

Full description

Bibliographic Details
Main Authors:	Yi-ching Peng, 彭怡菁
Other Authors:	Yen-Jen Oyang
Format:	Others
Language:	zh-TW
Published:	2001
Online Access:	http://ndltd.ncl.edu.tw/handle/40070256984248364580

id	ndltd-TW-089NTU00392023
record_format	oai_dc
spelling	ndltd-TW-089NTU003920232016-07-04T04:17:05Z http://ndltd.ncl.edu.tw/handle/40070256984248364580 Transactional Data Set Clustering based on a Statistical Measure 以統計量測為基礎之交易資料集分群 Yi-ching Peng 彭怡菁碩士國立臺灣大學資訊工程學研究所 89 This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to clustering results than the existing similarity measures. One common problem of existing clustering algorithms is that clustering quality is highly dependent on certain parameters set by the user. The parameters to be set by the user may even include the number of clusters in the output. Aimed at tackling this problem, a similarity measure based on the chi-square statistic is proposed in this thesis. This similarity measure, when combined with the complete-link hierarchical clustering algorithm, features several advantages. First, the user does not need to specify the number of clusters to be outputted. The user only needs to specify the level of statistical significance beyond which two objects are eligible to be clustered. Then, the clustering algorithm will automatically figure out the number of clusters that should be present in the output. The second advantage of the proposed approach is that each cluster identified has a strong statistical sense. The complete-link algorithm guarantees that the similarity between each pair of objects in a cluster exceeds a statistical significance threshold. The third advantage is that experimental results reveal that the proposed approach generally achieves better clustering quality than the existing algorithms. Yen-Jen Oyang 歐陽彥正 2001 學位論文 ; thesis 88 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 資訊工程學研究所 === 89 === This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to clustering results than the existing similarity measures. One common problem of existing clustering algorithms is that clustering quality is highly dependent on certain parameters set by the user. The parameters to be set by the user may even include the number of clusters in the output. Aimed at tackling this problem, a similarity measure based on the chi-square statistic is proposed in this thesis. This similarity measure, when combined with the complete-link hierarchical clustering algorithm, features several advantages. First, the user does not need to specify the number of clusters to be outputted. The user only needs to specify the level of statistical significance beyond which two objects are eligible to be clustered. Then, the clustering algorithm will automatically figure out the number of clusters that should be present in the output. The second advantage of the proposed approach is that each cluster identified has a strong statistical sense. The complete-link algorithm guarantees that the similarity between each pair of objects in a cluster exceeds a statistical significance threshold. The third advantage is that experimental results reveal that the proposed approach generally achieves better clustering quality than the existing algorithms.
author2	Yen-Jen Oyang
author_facet	Yen-Jen Oyang Yi-ching Peng 彭怡菁
author	Yi-ching Peng 彭怡菁
spellingShingle	Yi-ching Peng 彭怡菁 Transactional Data Set Clustering based on a Statistical Measure
author_sort	Yi-ching Peng
title	Transactional Data Set Clustering based on a Statistical Measure
title_short	Transactional Data Set Clustering based on a Statistical Measure
title_full	Transactional Data Set Clustering based on a Statistical Measure
title_fullStr	Transactional Data Set Clustering based on a Statistical Measure
title_full_unstemmed	Transactional Data Set Clustering based on a Statistical Measure
title_sort	transactional data set clustering based on a statistical measure
publishDate	2001
url	http://ndltd.ncl.edu.tw/handle/40070256984248364580
work_keys_str_mv	AT yichingpeng transactionaldatasetclusteringbasedonastatisticalmeasure AT péngyíjīng transactionaldatasetclusteringbasedonastatisticalmeasure AT yichingpeng yǐtǒngjìliàngcèwèijīchǔzhījiāoyìzīliàojífēnqún AT péngyíjīng yǐtǒngjìliàngcèwèijīchǔzhījiāoyìzīliàojífēnqún
_version_	1718333860897357824

Transactional Data Set Clustering based on a Statistical Measure

Similar Items