A Framework of Sequential Data Clustering and Classification for Data Analysis

碩士 === 國立臺灣科技大學 === 工業管理系 === 101 === Due to the model assumption, the traditional statistical methods such as multivariate analysis of variance (MANOVA) and Canonical Correlation Analysis (CCA) have the limitation on analyze the complicated dataset in the real world nowadays. Applying data mining t...

Full description

Bibliographic Details
Main Author: Yardin Heidsyam
Other Authors: Chao-Lung Yang
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/60004394225027145956
id ndltd-TW-101NTUS5041086
record_format oai_dc
spelling ndltd-TW-101NTUS50410862016-03-21T04:28:01Z http://ndltd.ncl.edu.tw/handle/60004394225027145956 A Framework of Sequential Data Clustering and Classification for Data Analysis 循序型資料分群及分類法整合架構於資料分析之研究 Yardin Heidsyam Yardin Heidsyam 碩士 國立臺灣科技大學 工業管理系 101 Due to the model assumption, the traditional statistical methods such as multivariate analysis of variance (MANOVA) and Canonical Correlation Analysis (CCA) have the limitation on analyze the complicated dataset in the real world nowadays. Applying data mining techniques such as clustering and classification algorithms are promising to reveal and analyze the multiple-attribute dataset. In this research, a framework integrating clustering and classification which are applied on different datasets: numerical measures (Q dataset) and categorical feature (X dataset), respectively, was proposed. The clustering method is expected to help on rapidly analyzing or identifying the numerical measures (Q dataset). The clustering results, labels, are then combined with X dataset as the inputs of the classification model which classifies the clustering labels by using X dataset. In this research, hierarchical clustering and Classification and Regression Tree (CART) are used to present clustering and classification methods, respectively, based on the their tree structure characteristic. In order to maintain the balanced performance of clustering and classification learning simultaneously, Clustering Classification Evaluation plot (CCE) plot was proposed to show performance measures of both clustering and classification results together. Here, clustering quality is measured by using complimentary sum squared of error (〖SSE〗_com) and classification performance is measured by the accuracy of prediction. Several real life datasets are used to evaluate the proposed framework. The results shows that CCE plots can be used to determine the number of clusters which is an important parameter affecting the performance of the propose framework. Chao-Lung Yang 楊朝龍 2013 學位論文 ; thesis 63 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 工業管理系 === 101 === Due to the model assumption, the traditional statistical methods such as multivariate analysis of variance (MANOVA) and Canonical Correlation Analysis (CCA) have the limitation on analyze the complicated dataset in the real world nowadays. Applying data mining techniques such as clustering and classification algorithms are promising to reveal and analyze the multiple-attribute dataset. In this research, a framework integrating clustering and classification which are applied on different datasets: numerical measures (Q dataset) and categorical feature (X dataset), respectively, was proposed. The clustering method is expected to help on rapidly analyzing or identifying the numerical measures (Q dataset). The clustering results, labels, are then combined with X dataset as the inputs of the classification model which classifies the clustering labels by using X dataset. In this research, hierarchical clustering and Classification and Regression Tree (CART) are used to present clustering and classification methods, respectively, based on the their tree structure characteristic. In order to maintain the balanced performance of clustering and classification learning simultaneously, Clustering Classification Evaluation plot (CCE) plot was proposed to show performance measures of both clustering and classification results together. Here, clustering quality is measured by using complimentary sum squared of error (〖SSE〗_com) and classification performance is measured by the accuracy of prediction. Several real life datasets are used to evaluate the proposed framework. The results shows that CCE plots can be used to determine the number of clusters which is an important parameter affecting the performance of the propose framework.
author2 Chao-Lung Yang
author_facet Chao-Lung Yang
Yardin Heidsyam
Yardin Heidsyam
author Yardin Heidsyam
Yardin Heidsyam
spellingShingle Yardin Heidsyam
Yardin Heidsyam
A Framework of Sequential Data Clustering and Classification for Data Analysis
author_sort Yardin Heidsyam
title A Framework of Sequential Data Clustering and Classification for Data Analysis
title_short A Framework of Sequential Data Clustering and Classification for Data Analysis
title_full A Framework of Sequential Data Clustering and Classification for Data Analysis
title_fullStr A Framework of Sequential Data Clustering and Classification for Data Analysis
title_full_unstemmed A Framework of Sequential Data Clustering and Classification for Data Analysis
title_sort framework of sequential data clustering and classification for data analysis
publishDate 2013
url http://ndltd.ncl.edu.tw/handle/60004394225027145956
work_keys_str_mv AT yardinheidsyam aframeworkofsequentialdataclusteringandclassificationfordataanalysis
AT yardinheidsyam aframeworkofsequentialdataclusteringandclassificationfordataanalysis
AT yardinheidsyam xúnxùxíngzīliàofēnqúnjífēnlèifǎzhěnghéjiàgòuyúzīliàofēnxīzhīyánjiū
AT yardinheidsyam xúnxùxíngzīliàofēnqúnjífēnlèifǎzhěnghéjiàgòuyúzīliàofēnxīzhīyánjiū
AT yardinheidsyam frameworkofsequentialdataclusteringandclassificationfordataanalysis
AT yardinheidsyam frameworkofsequentialdataclusteringandclassificationfordataanalysis
_version_ 1718209424019947520