Clustering based on Principal Component Analysis and Pseudoinverse Transformation

碩士 === 國立臺灣海洋大學 === 電機工程學系 === 96 === This thesis presents a clustering pre-process that utilizes PCA or SVD transformation to improve current projection-based clustering algorithms. Effectiveness of the pre-process is demonstrated by incorporating the pre-process with a projection-based method call...

Full description

Bibliographic Details
Main Authors: Sih-Yin Shen, 沈思吟
Other Authors: Jung-Hua Wang
Format: Others
Language:en_US
Published: 2008
Online Access:http://ndltd.ncl.edu.tw/handle/88980614390164214633
id ndltd-TW-096NTOU5442064
record_format oai_dc
spelling ndltd-TW-096NTOU54420642016-04-27T04:11:26Z http://ndltd.ncl.edu.tw/handle/88980614390164214633 Clustering based on Principal Component Analysis and Pseudoinverse Transformation 基於主成分分析及虛擬反矩陣之分群演算法 Sih-Yin Shen 沈思吟 碩士 國立臺灣海洋大學 電機工程學系 96 This thesis presents a clustering pre-process that utilizes PCA or SVD transformation to improve current projection-based clustering algorithms. Effectiveness of the pre-process is demonstrated by incorporating the pre-process with a projection-based method called DEPIT (Dimension Extension and Pseudo-Inverse Transformation). By doing so, the performance of DEPIT can be greatly improved in dealing with 2-D and 3-D input data. In [7], it was shown that performance of DEPIT is greatly affected by the form of data distribution, hence carefully analyzing the structure of input data is essential. The pre-processing technique employs PCA or SVD to transform input data from the original space to another space spanned by the eigenvectors or singular vectors, namely each data point is represented as linear combination of eigenvectors or singular vectors. The proposed pre-processing technique can enhance the applicability of DEPIT in dealing with various data distributions having different forms, regardless if there exists a dominant principal component. More significantly, the tedious rotation schedule and voting process in the original DEPIT are altogether dispensable. Issue of how dominant component affects the clustering result is also addressed. First, the existence of a heavily dominated component is detected, if not so, the transformed data set will be stretched or shrunken by applying an F operator. We also show that Mahalanobis metric outperforms Euclidean measure in updating centroids, as the former is more robust against outlier and more accurate in dealing with various data distributions. Jung-Hua Wang 王榮華 2008 學位論文 ; thesis 50 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣海洋大學 === 電機工程學系 === 96 === This thesis presents a clustering pre-process that utilizes PCA or SVD transformation to improve current projection-based clustering algorithms. Effectiveness of the pre-process is demonstrated by incorporating the pre-process with a projection-based method called DEPIT (Dimension Extension and Pseudo-Inverse Transformation). By doing so, the performance of DEPIT can be greatly improved in dealing with 2-D and 3-D input data. In [7], it was shown that performance of DEPIT is greatly affected by the form of data distribution, hence carefully analyzing the structure of input data is essential. The pre-processing technique employs PCA or SVD to transform input data from the original space to another space spanned by the eigenvectors or singular vectors, namely each data point is represented as linear combination of eigenvectors or singular vectors. The proposed pre-processing technique can enhance the applicability of DEPIT in dealing with various data distributions having different forms, regardless if there exists a dominant principal component. More significantly, the tedious rotation schedule and voting process in the original DEPIT are altogether dispensable. Issue of how dominant component affects the clustering result is also addressed. First, the existence of a heavily dominated component is detected, if not so, the transformed data set will be stretched or shrunken by applying an F operator. We also show that Mahalanobis metric outperforms Euclidean measure in updating centroids, as the former is more robust against outlier and more accurate in dealing with various data distributions.
author2 Jung-Hua Wang
author_facet Jung-Hua Wang
Sih-Yin Shen
沈思吟
author Sih-Yin Shen
沈思吟
spellingShingle Sih-Yin Shen
沈思吟
Clustering based on Principal Component Analysis and Pseudoinverse Transformation
author_sort Sih-Yin Shen
title Clustering based on Principal Component Analysis and Pseudoinverse Transformation
title_short Clustering based on Principal Component Analysis and Pseudoinverse Transformation
title_full Clustering based on Principal Component Analysis and Pseudoinverse Transformation
title_fullStr Clustering based on Principal Component Analysis and Pseudoinverse Transformation
title_full_unstemmed Clustering based on Principal Component Analysis and Pseudoinverse Transformation
title_sort clustering based on principal component analysis and pseudoinverse transformation
publishDate 2008
url http://ndltd.ncl.edu.tw/handle/88980614390164214633
work_keys_str_mv AT sihyinshen clusteringbasedonprincipalcomponentanalysisandpseudoinversetransformation
AT chénsīyín clusteringbasedonprincipalcomponentanalysisandpseudoinversetransformation
AT sihyinshen jīyúzhǔchéngfēnfēnxījíxūnǐfǎnjǔzhènzhīfēnqúnyǎnsuànfǎ
AT chénsīyín jīyúzhǔchéngfēnfēnxījíxūnǐfǎnjǔzhènzhīfēnqúnyǎnsuànfǎ
_version_ 1718249568996425728