A Fast Clustering Algorithm for Data with a Few Labeled Instances
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2015-01-01
|
Series: | Computational Intelligence and Neuroscience |
Online Access: | http://dx.doi.org/10.1155/2015/196098 |
id |
doaj-e70bf98b7f354fc7acbbb1a060b4d76f |
---|---|
record_format |
Article |
spelling |
doaj-e70bf98b7f354fc7acbbb1a060b4d76f2020-11-24T21:09:28ZengHindawi LimitedComputational Intelligence and Neuroscience1687-52651687-52732015-01-01201510.1155/2015/196098196098A Fast Clustering Algorithm for Data with a Few Labeled InstancesJinfeng Yang0Yong Xiao1Jiabing Wang2Qianli Ma3Yanhua Shen4Electric Power Research Institute of Guangdong Power Grid Corporation, Guangzhou 510080, ChinaElectric Power Research Institute of Guangdong Power Grid Corporation, Guangzhou 510080, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, ChinaSchool of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, ChinaSchool of Materials Science and Engineering, South China University of Technology, Guangzhou 510006, ChinaThe diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality.http://dx.doi.org/10.1155/2015/196098 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jinfeng Yang Yong Xiao Jiabing Wang Qianli Ma Yanhua Shen |
spellingShingle |
Jinfeng Yang Yong Xiao Jiabing Wang Qianli Ma Yanhua Shen A Fast Clustering Algorithm for Data with a Few Labeled Instances Computational Intelligence and Neuroscience |
author_facet |
Jinfeng Yang Yong Xiao Jiabing Wang Qianli Ma Yanhua Shen |
author_sort |
Jinfeng Yang |
title |
A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_short |
A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_full |
A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_fullStr |
A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_full_unstemmed |
A Fast Clustering Algorithm for Data with a Few Labeled Instances |
title_sort |
fast clustering algorithm for data with a few labeled instances |
publisher |
Hindawi Limited |
series |
Computational Intelligence and Neuroscience |
issn |
1687-5265 1687-5273 |
publishDate |
2015-01-01 |
description |
The diameter of a cluster is the maximum intracluster distance between pairs of instances within the same cluster, and the split of a cluster is the minimum distance between instances within the cluster and instances outside the cluster. Given a few labeled instances, this paper includes two aspects. First, we present a simple and fast clustering algorithm with the following property: if the ratio of the minimum split to the maximum diameter (RSD) of the optimal solution is greater than one, the algorithm returns optimal solutions for three clustering criteria. Second, we study the metric learning problem: learn a distance metric to make the RSD as large as possible. Compared with existing metric learning algorithms, one of our metric learning algorithms is computationally efficient: it is a linear programming model rather than a semidefinite programming model used by most of existing algorithms. We demonstrate empirically that the supervision and the learned metric can improve the clustering quality. |
url |
http://dx.doi.org/10.1155/2015/196098 |
work_keys_str_mv |
AT jinfengyang afastclusteringalgorithmfordatawithafewlabeledinstances AT yongxiao afastclusteringalgorithmfordatawithafewlabeledinstances AT jiabingwang afastclusteringalgorithmfordatawithafewlabeledinstances AT qianlima afastclusteringalgorithmfordatawithafewlabeledinstances AT yanhuashen afastclusteringalgorithmfordatawithafewlabeledinstances AT jinfengyang fastclusteringalgorithmfordatawithafewlabeledinstances AT yongxiao fastclusteringalgorithmfordatawithafewlabeledinstances AT jiabingwang fastclusteringalgorithmfordatawithafewlabeledinstances AT qianlima fastclusteringalgorithmfordatawithafewlabeledinstances AT yanhuashen fastclusteringalgorithmfordatawithafewlabeledinstances |
_version_ |
1716758282448666624 |