Safe Semi-Supervised Fuzzy <inline-formula> <tex-math notation="LaTeX">${C}$ </tex-math></inline-formula>-Means Clustering

With the rapid increase in the number of collected data samples, semi-supervised clustering (SSC) has become a useful mining tool to find an intrinsic data structure with the help of prior knowledge. The common used prior knowledge includes pair-wise constraints and cluster labels. In the past decad...

Full description

Bibliographic Details
Main Author: Haitao Gan
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8764532/
id doaj-82522db98ab244f4a26904f8002b09cc
record_format Article
spelling doaj-82522db98ab244f4a26904f8002b09cc2021-04-05T17:09:20ZengIEEEIEEE Access2169-35362019-01-017956599566410.1109/ACCESS.2019.29293078764532Safe Semi-Supervised Fuzzy <inline-formula> <tex-math notation="LaTeX">${C}$ </tex-math></inline-formula>-Means ClusteringHaitao Gan0https://orcid.org/0000-0002-6103-1797School of Automation, Hangzhou Dianzi University, Hangzhou, ChinaWith the rapid increase in the number of collected data samples, semi-supervised clustering (SSC) has become a useful mining tool to find an intrinsic data structure with the help of prior knowledge. The common used prior knowledge includes pair-wise constraints and cluster labels. In the past decades, many relevant methods are proposed to improve clustering performance of SSC by mining prior knowledge. In general, the prior knowledge is assumed to be beneficial to yielding desirable results. However, one can gather inappropriate prior knowledge in some scenarios, such as wrong cluster labels. In this case, prior knowledge can result in degenerating clustering performance. Therefore, how to raise safe semi-supervised clustering (S3C) should be investigated. A main goal of S3C is that the corresponding result is never inferior to that of the corresponding unsupervised clustering part. To achieve the goal, we propose safe semi-supervised Fuzzy c -Means clustering (S<sup>3</sup>FCM) which is extended from traditional semi-supervised FCM (SSFCM). In our algorithm, wrongly labeled samples are carefully explored by constraining the corresponding predictions to be those yielded by unsupervised clustering. Meanwhile, the predictions of the other labeled samples should approach to the given labels. Therefore the labeled samples are expected to be safely explored through a balance between unsupervised clustering and SSC. From the reported clustering results on different datasets, we can find that S<sup>3</sup>FCM can yield comparable, if not the best, performance among different unsupervised clustering and SSC methods even if the wrong ratio achieves 20%.https://ieeexplore.ieee.org/document/8764532/Unsupervised clusteringsemi-supervised clusteringfuzzy <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">c</italic>-meanswrong labels
collection DOAJ
language English
format Article
sources DOAJ
author Haitao Gan
spellingShingle Haitao Gan
Safe Semi-Supervised Fuzzy <inline-formula> <tex-math notation="LaTeX">${C}$ </tex-math></inline-formula>-Means Clustering
IEEE Access
Unsupervised clustering
semi-supervised clustering
fuzzy <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">c</italic>-means
wrong labels
author_facet Haitao Gan
author_sort Haitao Gan
title Safe Semi-Supervised Fuzzy <inline-formula> <tex-math notation="LaTeX">${C}$ </tex-math></inline-formula>-Means Clustering
title_short Safe Semi-Supervised Fuzzy <inline-formula> <tex-math notation="LaTeX">${C}$ </tex-math></inline-formula>-Means Clustering
title_full Safe Semi-Supervised Fuzzy <inline-formula> <tex-math notation="LaTeX">${C}$ </tex-math></inline-formula>-Means Clustering
title_fullStr Safe Semi-Supervised Fuzzy <inline-formula> <tex-math notation="LaTeX">${C}$ </tex-math></inline-formula>-Means Clustering
title_full_unstemmed Safe Semi-Supervised Fuzzy <inline-formula> <tex-math notation="LaTeX">${C}$ </tex-math></inline-formula>-Means Clustering
title_sort safe semi-supervised fuzzy <inline-formula> <tex-math notation="latex">${c}$ </tex-math></inline-formula>-means clustering
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description With the rapid increase in the number of collected data samples, semi-supervised clustering (SSC) has become a useful mining tool to find an intrinsic data structure with the help of prior knowledge. The common used prior knowledge includes pair-wise constraints and cluster labels. In the past decades, many relevant methods are proposed to improve clustering performance of SSC by mining prior knowledge. In general, the prior knowledge is assumed to be beneficial to yielding desirable results. However, one can gather inappropriate prior knowledge in some scenarios, such as wrong cluster labels. In this case, prior knowledge can result in degenerating clustering performance. Therefore, how to raise safe semi-supervised clustering (S3C) should be investigated. A main goal of S3C is that the corresponding result is never inferior to that of the corresponding unsupervised clustering part. To achieve the goal, we propose safe semi-supervised Fuzzy c -Means clustering (S<sup>3</sup>FCM) which is extended from traditional semi-supervised FCM (SSFCM). In our algorithm, wrongly labeled samples are carefully explored by constraining the corresponding predictions to be those yielded by unsupervised clustering. Meanwhile, the predictions of the other labeled samples should approach to the given labels. Therefore the labeled samples are expected to be safely explored through a balance between unsupervised clustering and SSC. From the reported clustering results on different datasets, we can find that S<sup>3</sup>FCM can yield comparable, if not the best, performance among different unsupervised clustering and SSC methods even if the wrong ratio achieves 20%.
topic Unsupervised clustering
semi-supervised clustering
fuzzy <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">c</italic>-means
wrong labels
url https://ieeexplore.ieee.org/document/8764532/
work_keys_str_mv AT haitaogan safesemisupervisedfuzzyinlineformulatexmathnotationlatexctexmathinlineformulameansclustering
_version_ 1721540149628633088