Center-based clustering with the string data

碩士 === 國立中央大學 === 工業管理研究所 === 102 === The clustering has been studied and applied in many researches in the past. In the goal of the similarities between objects in the same clustering are high while the similarities between objects in different clustering are low. In the clustering have lot of data...

Full description

Bibliographic Details
Main Authors: Jia-Wun Syu, 許佳雯
Other Authors: 曾富祥
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/vvv2cr
Description
Summary:碩士 === 國立中央大學 === 工業管理研究所 === 102 === The clustering has been studied and applied in many researches in the past. In the goal of the similarities between objects in the same clustering are high while the similarities between objects in different clustering are low. In the clustering have lot of data type, but the most be used is numerical data type. Until now the string data type haven’t been conducted into the development, but it contain the enormous potential for application, such as parts repair processes, products manufacturing processes and disease signs occurrence of order etc. Compared with other data types, the string data type have two inevitable elements need to be considered, that are the character and order. Therefore, in this study we will propose a viable method for clustering with string data. In the past of research, most studies focus on dealing the object with same dimensionality. Having same dimensional has been complete defined clustering process by many scholars. But in string data most the objects with different dimensionality, which is the length of objects are not equal. For example, if product 1 process through the machine A, B and C and product 2 process through the machine B, C, D and A. How to measuring the similarity does not affect the order of the string data, that is an important issue. In our study, we apply the Edit distance and Simple matching distance measuring dissimilarity with string data. At present mostly using hierarchical clustering method to deal with the string data, such as Tian et al. (1996), Dinu and Sgarro (2006), and Tseng (2013). But in our study, we have been reported based on the non-hierarchical clustering to deal with the string data. Compared to other type of clustering algorithms, center-based algorithms are very efficient for clustering. So, we proposed the new model combining the concept of K-means and K-modes. Let us establish the goal of clustering for string data.