Transactional Data Set Clustering based on a Statistical Measure
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 89 === This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2001
|
Online Access: | http://ndltd.ncl.edu.tw/handle/40070256984248364580 |
id |
ndltd-TW-089NTU00392023 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-089NTU003920232016-07-04T04:17:05Z http://ndltd.ncl.edu.tw/handle/40070256984248364580 Transactional Data Set Clustering based on a Statistical Measure 以統計量測為基礎之交易資料集分群 Yi-ching Peng 彭怡菁 碩士 國立臺灣大學 資訊工程學研究所 89 This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to clustering results than the existing similarity measures. One common problem of existing clustering algorithms is that clustering quality is highly dependent on certain parameters set by the user. The parameters to be set by the user may even include the number of clusters in the output. Aimed at tackling this problem, a similarity measure based on the chi-square statistic is proposed in this thesis. This similarity measure, when combined with the complete-link hierarchical clustering algorithm, features several advantages. First, the user does not need to specify the number of clusters to be outputted. The user only needs to specify the level of statistical significance beyond which two objects are eligible to be clustered. Then, the clustering algorithm will automatically figure out the number of clusters that should be present in the output. The second advantage of the proposed approach is that each cluster identified has a strong statistical sense. The complete-link algorithm guarantees that the similarity between each pair of objects in a cluster exceeds a statistical significance threshold. The third advantage is that experimental results reveal that the proposed approach generally achieves better clustering quality than the existing algorithms. Yen-Jen Oyang 歐陽彥正 2001 學位論文 ; thesis 88 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊工程學研究所 === 89 === This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to clustering results than the existing similarity measures. One common problem of existing clustering algorithms is that clustering quality is highly dependent on certain parameters set by the user. The parameters to be set by the user may even include the number of clusters in the output. Aimed at tackling this problem, a similarity measure based on the chi-square statistic is proposed in this thesis. This similarity measure, when combined with the complete-link hierarchical clustering algorithm, features several advantages. First, the user does not need to specify the number of clusters to be outputted. The user only needs to specify the level of statistical significance beyond which two objects are eligible to be clustered. Then, the clustering algorithm will automatically figure out the number of clusters that should be present in the output. The second advantage of the proposed approach is that each cluster identified has a strong statistical sense. The complete-link algorithm guarantees that the similarity between each pair of objects in a cluster exceeds a statistical significance threshold. The third advantage is that experimental results reveal that the proposed approach generally achieves better clustering quality than the existing algorithms.
|
author2 |
Yen-Jen Oyang |
author_facet |
Yen-Jen Oyang Yi-ching Peng 彭怡菁 |
author |
Yi-ching Peng 彭怡菁 |
spellingShingle |
Yi-ching Peng 彭怡菁 Transactional Data Set Clustering based on a Statistical Measure |
author_sort |
Yi-ching Peng |
title |
Transactional Data Set Clustering based on a Statistical Measure |
title_short |
Transactional Data Set Clustering based on a Statistical Measure |
title_full |
Transactional Data Set Clustering based on a Statistical Measure |
title_fullStr |
Transactional Data Set Clustering based on a Statistical Measure |
title_full_unstemmed |
Transactional Data Set Clustering based on a Statistical Measure |
title_sort |
transactional data set clustering based on a statistical measure |
publishDate |
2001 |
url |
http://ndltd.ncl.edu.tw/handle/40070256984248364580 |
work_keys_str_mv |
AT yichingpeng transactionaldatasetclusteringbasedonastatisticalmeasure AT péngyíjīng transactionaldatasetclusteringbasedonastatisticalmeasure AT yichingpeng yǐtǒngjìliàngcèwèijīchǔzhījiāoyìzīliàojífēnqún AT péngyíjīng yǐtǒngjìliàngcèwèijīchǔzhījiāoyìzīliàojífēnqún |
_version_ |
1718333860897357824 |