A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used m...

Full description

Bibliographic Details
Main Authors: Congming Shi, Bingtao Wei, Shoulin Wei, Wen Wang, Hai Liu, Jialei Liu
Format: Article
Language:English
Published: SpringerOpen 2021-02-01
Series:EURASIP Journal on Wireless Communications and Networking
Subjects:
Online Access:https://doi.org/10.1186/s13638-021-01910-w
id doaj-1186c0ec45514b1e9812c8bb15b720a6
record_format Article
spelling doaj-1186c0ec45514b1e9812c8bb15b720a62021-02-21T12:28:56ZengSpringerOpenEURASIP Journal on Wireless Communications and Networking1687-14992021-02-012021111610.1186/s13638-021-01910-wA quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithmCongming Shi0Bingtao Wei1Shoulin Wei2Wen Wang3Hai Liu4Jialei Liu5School of Software Engineering, Anyang Normal UniversityFaculty of Information Engineering and Automation, Kunming University of Science and TechnologyFaculty of Information Engineering and Automation, Kunming University of Science and TechnologyFaculty of Information Engineering and Automation, Kunming University of Science and TechnologySchool of Software Engineering, Anyang Normal UniversitySchool of Software Engineering, Anyang Normal UniversityAbstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above-computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.https://doi.org/10.1186/s13638-021-01910-wMachine learningClusteringElbow methodSilhouette coefficientCosine law
collection DOAJ
language English
format Article
sources DOAJ
author Congming Shi
Bingtao Wei
Shoulin Wei
Wen Wang
Hai Liu
Jialei Liu
spellingShingle Congming Shi
Bingtao Wei
Shoulin Wei
Wen Wang
Hai Liu
Jialei Liu
A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
EURASIP Journal on Wireless Communications and Networking
Machine learning
Clustering
Elbow method
Silhouette coefficient
Cosine law
author_facet Congming Shi
Bingtao Wei
Shoulin Wei
Wen Wang
Hai Liu
Jialei Liu
author_sort Congming Shi
title A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
title_short A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
title_full A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
title_fullStr A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
title_full_unstemmed A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
title_sort quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm
publisher SpringerOpen
series EURASIP Journal on Wireless Communications and Networking
issn 1687-1499
publishDate 2021-02-01
description Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above-computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.
topic Machine learning
Clustering
Elbow method
Silhouette coefficient
Cosine law
url https://doi.org/10.1186/s13638-021-01910-w
work_keys_str_mv AT congmingshi aquantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT bingtaowei aquantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT shoulinwei aquantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT wenwang aquantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT hailiu aquantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT jialeiliu aquantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT congmingshi quantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT bingtaowei quantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT shoulinwei quantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT wenwang quantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT hailiu quantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
AT jialeiliu quantitativediscriminantmethodofelbowpointfortheoptimalnumberofclustersinclusteringalgorithm
_version_ 1724257997139476480