Three-Way Ensemble Clustering for Incomplete Data

There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms cannot be used for incomplete data sets directly because objects with missing values need to be prepro...

Full description

Bibliographic Details
Main Authors: Pingxin Wang, Xiangjian Chen
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9092973/
Description
Summary:There are many incomplete data sets in all fields of scientific studies due to random noise, data lost, limitations of data acquisition, data misunderstanding etc. Most of the clustering algorithms cannot be used for incomplete data sets directly because objects with missing values need to be preprocessed. In this paper, we present a new imputation algorithm for incomplete data and a three-way ensemble clustering algorithm based on the imputation result. In the proposed imputation algorithm, the objects with nonmissing values are firstly clustered by using hard clustering methods. For each missing objects, the mean attribute's value of each cluster are used to fill the missing attribute's value, respectively. Perturbation analysis of cluster centroid is applied to search the optimal imputation. As an application of proposed imputation method, we develop a three-way ensemble clustering algorithm by using the ideas of clustering ensemble and three-way decision. The objects with the same cluster label in different clustering results are assigned the core region of corresponding cluster while the objects with different clustering labels are assigned to the fringe region. Therefore, a three-way clustering is naturally formed. The experimental results on UCI data sets can verify that the algorithm is effective in revealing cluster structures.
ISSN:2169-3536