Summary: | Clustering by fast searching and finding density peaks (DPC) method can rapidly identify the centers of clusters which have relatively high densities and high distances according to a decision graph. Various methods have been introduced to extend the DPC model over the past five years. DPC was originally presented as an unsupervised learning algorithm, and the thought of adding some prior information to DPC emerges as an alternative approach for improving its performance. It is extravagant to collect labeled data in real applications, and annotation of class labels is a nontrivial work, while pairwise constraint information is easier to get. Furthermore, the class label information can be converted into pairwise constraint information. Thus, we can take full advantage of pairwise constraints (or prior information) as much as possible. So this paper presents a new semi-supervised density peaks clustering algorithm (SSDPC) that uses constraint projection, which is flexible in loosening a few constraints over the learning stage. In the first stage, instances involving instance-level constraints and the remaining instances are concurrently projected to a lower dimensional data space led by the pairwise constraints, where viewing the distribution of data instances more clearly is available. Subsequently, traditional DPC is executed on the new lower dimensional dataset. Lastly, a few datasets from the Microsoft Research Asia Multimedia (MSRA-MM) image and UCI machine learning repository datasets are adopted in the experimental validation. The experimental results demonstrate that the proposed SSDPC achieves better performance than other three semi-supervised clustering algorithms.
|