Transactional Data Set Clustering based on a Statistical Measure

碩士 === 國立臺灣大學 === 資訊工程學研究所 === 89 === This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to...

Full description

Bibliographic Details
Main Authors: Yi-ching Peng, 彭怡菁
Other Authors: Yen-Jen Oyang
Format: Others
Language:zh-TW
Published: 2001
Online Access:http://ndltd.ncl.edu.tw/handle/40070256984248364580
id ndltd-TW-089NTU00392023
record_format oai_dc
spelling ndltd-TW-089NTU003920232016-07-04T04:17:05Z http://ndltd.ncl.edu.tw/handle/40070256984248364580 Transactional Data Set Clustering based on a Statistical Measure 以統計量測為基礎之交易資料集分群 Yi-ching Peng 彭怡菁 碩士 國立臺灣大學 資訊工程學研究所 89 This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to clustering results than the existing similarity measures. One common problem of existing clustering algorithms is that clustering quality is highly dependent on certain parameters set by the user. The parameters to be set by the user may even include the number of clusters in the output. Aimed at tackling this problem, a similarity measure based on the chi-square statistic is proposed in this thesis. This similarity measure, when combined with the complete-link hierarchical clustering algorithm, features several advantages. First, the user does not need to specify the number of clusters to be outputted. The user only needs to specify the level of statistical significance beyond which two objects are eligible to be clustered. Then, the clustering algorithm will automatically figure out the number of clusters that should be present in the output. The second advantage of the proposed approach is that each cluster identified has a strong statistical sense. The complete-link algorithm guarantees that the similarity between each pair of objects in a cluster exceeds a statistical significance threshold. The third advantage is that experimental results reveal that the proposed approach generally achieves better clustering quality than the existing algorithms. Yen-Jen Oyang 歐陽彥正 2001 學位論文 ; thesis 88 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 89 === This thesis discusses the effect of employing the chi-square statistic as the similarity measure in clustering transactional data sets. The motivation behind this study is to propose a similarity measure that provides more mathematical insights with respect to clustering results than the existing similarity measures. One common problem of existing clustering algorithms is that clustering quality is highly dependent on certain parameters set by the user. The parameters to be set by the user may even include the number of clusters in the output. Aimed at tackling this problem, a similarity measure based on the chi-square statistic is proposed in this thesis. This similarity measure, when combined with the complete-link hierarchical clustering algorithm, features several advantages. First, the user does not need to specify the number of clusters to be outputted. The user only needs to specify the level of statistical significance beyond which two objects are eligible to be clustered. Then, the clustering algorithm will automatically figure out the number of clusters that should be present in the output. The second advantage of the proposed approach is that each cluster identified has a strong statistical sense. The complete-link algorithm guarantees that the similarity between each pair of objects in a cluster exceeds a statistical significance threshold. The third advantage is that experimental results reveal that the proposed approach generally achieves better clustering quality than the existing algorithms.
author2 Yen-Jen Oyang
author_facet Yen-Jen Oyang
Yi-ching Peng
彭怡菁
author Yi-ching Peng
彭怡菁
spellingShingle Yi-ching Peng
彭怡菁
Transactional Data Set Clustering based on a Statistical Measure
author_sort Yi-ching Peng
title Transactional Data Set Clustering based on a Statistical Measure
title_short Transactional Data Set Clustering based on a Statistical Measure
title_full Transactional Data Set Clustering based on a Statistical Measure
title_fullStr Transactional Data Set Clustering based on a Statistical Measure
title_full_unstemmed Transactional Data Set Clustering based on a Statistical Measure
title_sort transactional data set clustering based on a statistical measure
publishDate 2001
url http://ndltd.ncl.edu.tw/handle/40070256984248364580
work_keys_str_mv AT yichingpeng transactionaldatasetclusteringbasedonastatisticalmeasure
AT péngyíjīng transactionaldatasetclusteringbasedonastatisticalmeasure
AT yichingpeng yǐtǒngjìliàngcèwèijīchǔzhījiāoyìzīliàojífēnqún
AT péngyíjīng yǐtǒngjìliàngcèwèijīchǔzhījiāoyìzīliàojífēnqún
_version_ 1718333860897357824