An Optimal and Stable Algorithm for Clustering Numerical Data

In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications,...

Full description

Bibliographic Details
Main Authors: Ali Seman, Azizian Mohd Sapawi
Format: Article
Language:English
Published: MDPI AG 2021-06-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/14/7/197
id doaj-3bbc638567824795b6d3ace9057b9f1b
record_format Article
spelling doaj-3bbc638567824795b6d3ace9057b9f1b2021-07-23T13:26:48ZengMDPI AGAlgorithms1999-48932021-06-011419719710.3390/a14070197An Optimal and Stable Algorithm for Clustering Numerical DataAli Seman0Azizian Mohd Sapawi1Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), Shah Alam 40450, MalaysiaFaculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), Shah Alam 40450, MalaysiaIn the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools.https://www.mdpi.com/1999-4893/14/7/197numerical clusteringcategorical clusteringcluster analysispartitional clustering algorithmfuzzy clustering
collection DOAJ
language English
format Article
sources DOAJ
author Ali Seman
Azizian Mohd Sapawi
spellingShingle Ali Seman
Azizian Mohd Sapawi
An Optimal and Stable Algorithm for Clustering Numerical Data
Algorithms
numerical clustering
categorical clustering
cluster analysis
partitional clustering algorithm
fuzzy clustering
author_facet Ali Seman
Azizian Mohd Sapawi
author_sort Ali Seman
title An Optimal and Stable Algorithm for Clustering Numerical Data
title_short An Optimal and Stable Algorithm for Clustering Numerical Data
title_full An Optimal and Stable Algorithm for Clustering Numerical Data
title_fullStr An Optimal and Stable Algorithm for Clustering Numerical Data
title_full_unstemmed An Optimal and Stable Algorithm for Clustering Numerical Data
title_sort optimal and stable algorithm for clustering numerical data
publisher MDPI AG
series Algorithms
issn 1999-4893
publishDate 2021-06-01
description In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools.
topic numerical clustering
categorical clustering
cluster analysis
partitional clustering algorithm
fuzzy clustering
url https://www.mdpi.com/1999-4893/14/7/197
work_keys_str_mv AT aliseman anoptimalandstablealgorithmforclusteringnumericaldata
AT azizianmohdsapawi anoptimalandstablealgorithmforclusteringnumericaldata
AT aliseman optimalandstablealgorithmforclusteringnumericaldata
AT azizianmohdsapawi optimalandstablealgorithmforclusteringnumericaldata
_version_ 1721289888781828096