Clustering based on Principal Component Analysis and Pseudoinverse Transformation
碩士 === 國立臺灣海洋大學 === 電機工程學系 === 96 === This thesis presents a clustering pre-process that utilizes PCA or SVD transformation to improve current projection-based clustering algorithms. Effectiveness of the pre-process is demonstrated by incorporating the pre-process with a projection-based method call...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2008
|
Online Access: | http://ndltd.ncl.edu.tw/handle/88980614390164214633 |
id |
ndltd-TW-096NTOU5442064 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-096NTOU54420642016-04-27T04:11:26Z http://ndltd.ncl.edu.tw/handle/88980614390164214633 Clustering based on Principal Component Analysis and Pseudoinverse Transformation 基於主成分分析及虛擬反矩陣之分群演算法 Sih-Yin Shen 沈思吟 碩士 國立臺灣海洋大學 電機工程學系 96 This thesis presents a clustering pre-process that utilizes PCA or SVD transformation to improve current projection-based clustering algorithms. Effectiveness of the pre-process is demonstrated by incorporating the pre-process with a projection-based method called DEPIT (Dimension Extension and Pseudo-Inverse Transformation). By doing so, the performance of DEPIT can be greatly improved in dealing with 2-D and 3-D input data. In [7], it was shown that performance of DEPIT is greatly affected by the form of data distribution, hence carefully analyzing the structure of input data is essential. The pre-processing technique employs PCA or SVD to transform input data from the original space to another space spanned by the eigenvectors or singular vectors, namely each data point is represented as linear combination of eigenvectors or singular vectors. The proposed pre-processing technique can enhance the applicability of DEPIT in dealing with various data distributions having different forms, regardless if there exists a dominant principal component. More significantly, the tedious rotation schedule and voting process in the original DEPIT are altogether dispensable. Issue of how dominant component affects the clustering result is also addressed. First, the existence of a heavily dominated component is detected, if not so, the transformed data set will be stretched or shrunken by applying an F operator. We also show that Mahalanobis metric outperforms Euclidean measure in updating centroids, as the former is more robust against outlier and more accurate in dealing with various data distributions. Jung-Hua Wang 王榮華 2008 學位論文 ; thesis 50 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣海洋大學 === 電機工程學系 === 96 === This thesis presents a clustering pre-process that utilizes PCA or SVD transformation to improve current projection-based clustering algorithms. Effectiveness of the pre-process is demonstrated by incorporating the pre-process with a projection-based method called DEPIT (Dimension Extension and Pseudo-Inverse Transformation). By doing so, the performance of DEPIT can be greatly improved in dealing with 2-D and 3-D input data. In [7], it was shown that performance of DEPIT is greatly affected by the form of data distribution, hence carefully analyzing the structure of input data is essential. The pre-processing technique employs PCA or SVD to transform input data from the original space to another space spanned by the eigenvectors or singular vectors, namely each data point is represented as linear combination of eigenvectors or singular vectors. The proposed pre-processing technique can enhance the applicability of DEPIT in dealing with various data distributions having different forms, regardless if there exists a dominant principal component. More significantly, the tedious rotation schedule and voting process in the original DEPIT are altogether dispensable.
Issue of how dominant component affects the clustering result is also addressed. First, the existence of a heavily dominated component is detected, if not so, the transformed data set will be stretched or shrunken by applying an F operator. We also show that Mahalanobis metric outperforms Euclidean measure in updating centroids, as the former is more robust against outlier and more accurate in dealing with various data distributions.
|
author2 |
Jung-Hua Wang |
author_facet |
Jung-Hua Wang Sih-Yin Shen 沈思吟 |
author |
Sih-Yin Shen 沈思吟 |
spellingShingle |
Sih-Yin Shen 沈思吟 Clustering based on Principal Component Analysis and Pseudoinverse Transformation |
author_sort |
Sih-Yin Shen |
title |
Clustering based on Principal Component Analysis and Pseudoinverse Transformation |
title_short |
Clustering based on Principal Component Analysis and Pseudoinverse Transformation |
title_full |
Clustering based on Principal Component Analysis and Pseudoinverse Transformation |
title_fullStr |
Clustering based on Principal Component Analysis and Pseudoinverse Transformation |
title_full_unstemmed |
Clustering based on Principal Component Analysis and Pseudoinverse Transformation |
title_sort |
clustering based on principal component analysis and pseudoinverse transformation |
publishDate |
2008 |
url |
http://ndltd.ncl.edu.tw/handle/88980614390164214633 |
work_keys_str_mv |
AT sihyinshen clusteringbasedonprincipalcomponentanalysisandpseudoinversetransformation AT chénsīyín clusteringbasedonprincipalcomponentanalysisandpseudoinversetransformation AT sihyinshen jīyúzhǔchéngfēnfēnxījíxūnǐfǎnjǔzhènzhīfēnqúnyǎnsuànfǎ AT chénsīyín jīyúzhǔchéngfēnfēnxījíxūnǐfǎnjǔzhènzhīfēnqúnyǎnsuànfǎ |
_version_ |
1718249568996425728 |