A Fast Clustering Algorithm for Data with a Few Labeled Instances

The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects...

Full description

Bibliographic Details
Main Authors: Jinfeng Yang, Yong Xiao, Jiabing Wang, Qianli Ma, Yanhua Shen
Format: Article
Language:English
Published: Hindawi Limited 2015-01-01
Series:Computational Intelligence and Neuroscience
Online Access:http://dx.doi.org/10.1155/2015/196098
id doaj-e70bf98b7f354fc7acbbb1a060b4d76f
record_format Article
spelling doaj-e70bf98b7f354fc7acbbb1a060b4d76f2020-11-24T21:09:28ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52651687-52732015-01-01201510.1155/2015/196098196098A Fast Clustering Algorithm for Data with a Few Labeled InstancesJinfeng Yang0Yong Xiao1Jiabing Wang2Qianli Ma3Yanhua Shen4Electric Power Research Institute of Guangdong Power Grid Corporation, Guangzhou 510080, ChinaElectric Power Research Institute of Guangdong Power Grid Corporation, Guangzhou 510080, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, ChinaSchool of Materials Science and Engineering, South China University of Technology, Guangzhou 510006, ChinaThe diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality.http://dx.doi.org/10.1155/2015/196098
collection DOAJ
language English
format Article
sources DOAJ
author Jinfeng Yang
Yong Xiao
Jiabing Wang
Qianli Ma
Yanhua Shen
spellingShingle Jinfeng Yang
Yong Xiao
Jiabing Wang
Qianli Ma
Yanhua Shen
A Fast Clustering Algorithm for Data with a Few Labeled Instances
Computational Intelligence and Neuroscience
author_facet Jinfeng Yang
Yong Xiao
Jiabing Wang
Qianli Ma
Yanhua Shen
author_sort Jinfeng Yang
title A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_short A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_full A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_fullStr A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_full_unstemmed A Fast Clustering Algorithm for Data with a Few Labeled Instances
title_sort fast clustering algorithm for data with a few labeled instances
publisher Hindawi Limited
series Computational Intelligence and Neuroscience
issn 1687-5265
1687-5273
publishDate 2015-01-01
description The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality.
url http://dx.doi.org/10.1155/2015/196098
work_keys_str_mv AT jinfengyang afastclusteringalgorithmfordatawithafewlabeledinstances
AT yongxiao afastclusteringalgorithmfordatawithafewlabeledinstances
AT jiabingwang afastclusteringalgorithmfordatawithafewlabeledinstances
AT qianlima afastclusteringalgorithmfordatawithafewlabeledinstances
AT yanhuashen afastclusteringalgorithmfordatawithafewlabeledinstances
AT jinfengyang fastclusteringalgorithmfordatawithafewlabeledinstances
AT yongxiao fastclusteringalgorithmfordatawithafewlabeledinstances
AT jiabingwang fastclusteringalgorithmfordatawithafewlabeledinstances
AT qianlima fastclusteringalgorithmfordatawithafewlabeledinstances
AT yanhuashen fastclusteringalgorithmfordatawithafewlabeledinstances
_version_ 1716758282448666624