A Framework of Sequential Data Clustering and Classification for Data Analysis
碩士 === 國立臺灣科技大學 === 工業管理系 === 101 === Due to the model assumption, the traditional statistical methods such as multivariate analysis of variance (MANOVA) and Canonical Correlation Analysis (CCA) have the limitation on analyze the complicated dataset in the real world nowadays. Applying data mining t...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2013
|
Online Access: | http://ndltd.ncl.edu.tw/handle/60004394225027145956 |
id |
ndltd-TW-101NTUS5041086 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-101NTUS50410862016-03-21T04:28:01Z http://ndltd.ncl.edu.tw/handle/60004394225027145956 A Framework of Sequential Data Clustering and Classification for Data Analysis 循序型資料分群及分類法整合架構於資料分析之研究 Yardin Heidsyam Yardin Heidsyam 碩士 國立臺灣科技大學 工業管理系 101 Due to the model assumption, the traditional statistical methods such as multivariate analysis of variance (MANOVA) and Canonical Correlation Analysis (CCA) have the limitation on analyze the complicated dataset in the real world nowadays. Applying data mining techniques such as clustering and classification algorithms are promising to reveal and analyze the multiple-attribute dataset. In this research, a framework integrating clustering and classification which are applied on different datasets: numerical measures (Q dataset) and categorical feature (X dataset), respectively, was proposed. The clustering method is expected to help on rapidly analyzing or identifying the numerical measures (Q dataset). The clustering results, labels, are then combined with X dataset as the inputs of the classification model which classifies the clustering labels by using X dataset. In this research, hierarchical clustering and Classification and Regression Tree (CART) are used to present clustering and classification methods, respectively, based on the their tree structure characteristic. In order to maintain the balanced performance of clustering and classification learning simultaneously, Clustering Classification Evaluation plot (CCE) plot was proposed to show performance measures of both clustering and classification results together. Here, clustering quality is measured by using complimentary sum squared of error (〖SSE〗_com) and classification performance is measured by the accuracy of prediction. Several real life datasets are used to evaluate the proposed framework. The results shows that CCE plots can be used to determine the number of clusters which is an important parameter affecting the performance of the propose framework. Chao-Lung Yang 楊朝龍 2013 學位論文 ; thesis 63 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 工業管理系 === 101 === Due to the model assumption, the traditional statistical methods such as multivariate analysis of variance (MANOVA) and Canonical Correlation Analysis (CCA) have the limitation on analyze the complicated dataset in the real world nowadays. Applying data mining techniques such as clustering and classification algorithms are promising to reveal and analyze the multiple-attribute dataset. In this research, a framework integrating clustering and classification which are applied on different datasets: numerical measures (Q dataset) and categorical feature (X dataset), respectively, was proposed. The clustering method is expected to help on rapidly analyzing or identifying the numerical measures (Q dataset). The clustering results, labels, are then combined with X dataset as the inputs of the classification model which classifies the clustering labels by using X dataset. In this research, hierarchical clustering and Classification and Regression Tree (CART) are used to present clustering and classification methods, respectively, based on the their tree structure characteristic. In order to maintain the balanced performance of clustering and classification learning simultaneously, Clustering Classification Evaluation plot (CCE) plot was proposed to show performance measures of both clustering and classification results together. Here, clustering quality is measured by using complimentary sum squared of error (〖SSE〗_com) and classification performance is measured by the accuracy of prediction. Several real life datasets are used to evaluate the proposed framework. The results shows that CCE plots can be used to determine the number of clusters which is an important parameter affecting the performance of the propose framework.
|
author2 |
Chao-Lung Yang |
author_facet |
Chao-Lung Yang Yardin Heidsyam Yardin Heidsyam |
author |
Yardin Heidsyam Yardin Heidsyam |
spellingShingle |
Yardin Heidsyam Yardin Heidsyam A Framework of Sequential Data Clustering and Classification for Data Analysis |
author_sort |
Yardin Heidsyam |
title |
A Framework of Sequential Data Clustering and Classification for Data Analysis |
title_short |
A Framework of Sequential Data Clustering and Classification for Data Analysis |
title_full |
A Framework of Sequential Data Clustering and Classification for Data Analysis |
title_fullStr |
A Framework of Sequential Data Clustering and Classification for Data Analysis |
title_full_unstemmed |
A Framework of Sequential Data Clustering and Classification for Data Analysis |
title_sort |
framework of sequential data clustering and classification for data analysis |
publishDate |
2013 |
url |
http://ndltd.ncl.edu.tw/handle/60004394225027145956 |
work_keys_str_mv |
AT yardinheidsyam aframeworkofsequentialdataclusteringandclassificationfordataanalysis AT yardinheidsyam aframeworkofsequentialdataclusteringandclassificationfordataanalysis AT yardinheidsyam xúnxùxíngzīliàofēnqúnjífēnlèifǎzhěnghéjiàgòuyúzīliàofēnxīzhīyánjiū AT yardinheidsyam xúnxùxíngzīliàofēnqúnjífēnlèifǎzhěnghéjiàgòuyúzīliàofēnxīzhīyánjiū AT yardinheidsyam frameworkofsequentialdataclusteringandclassificationfordataanalysis AT yardinheidsyam frameworkofsequentialdataclusteringandclassificationfordataanalysis |
_version_ |
1718209424019947520 |