A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality
The standard formulation of the K -means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake's, Elkan's, Annulus,...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8681032/ |
id |
doaj-43473507a6874376abcb3ba13cc05dea |
---|---|
record_format |
Article |
spelling |
doaj-43473507a6874376abcb3ba13cc05dea2021-03-29T22:46:10ZengIEEEIEEE Access2169-35362019-01-017422804229710.1109/ACCESS.2019.29078858681032A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle InequalityWojciech Kwedlo0https://orcid.org/0000-0002-5040-2302Pawel J. Czochanski1Faculty of Computer Science, Bialystok University of Technology, Bialystok, PolandFaculty of Computer Science, Bialystok University of Technology, Bialystok, PolandThe standard formulation of the K -means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake's, Elkan's, Annulus, and Yinyang algorithms. We propose a hybrid MPI/OpenMP parallelization of these algorithms in which the dataset and the corresponding data structures storing bounds on distances are evenly divided among MPI processes. Then, in the assignment step of a K -means iteration, each MPI process computes the assignment of its portion of data using OpenMP threads. In the update step of the iteration, the cluster centroids are computed using a hierarchical all-reduce operation. In the computational experiments, we compared the strong scalability of these four algorithms with the scalability of Lloyd's algorithm, parallelized using the same approach. The results indicate that all four algorithms maintain an advantage in computing time over Lloyd's algorithm. A comparison with two software packages, whose sources are publicly available, in the same computing environment, shows that our implementations are more efficient.https://ieeexplore.ieee.org/document/8681032/Clustering<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">K</italic>-meanstriangle inequalityMPIOpenMPhybrid parallelization |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Wojciech Kwedlo Pawel J. Czochanski |
spellingShingle |
Wojciech Kwedlo Pawel J. Czochanski A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality IEEE Access Clustering <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">K</italic>-means triangle inequality MPI OpenMP hybrid parallelization |
author_facet |
Wojciech Kwedlo Pawel J. Czochanski |
author_sort |
Wojciech Kwedlo |
title |
A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality |
title_short |
A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality |
title_full |
A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality |
title_fullStr |
A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality |
title_full_unstemmed |
A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality |
title_sort |
hybrid mpi/openmp parallelization of <inline-formula> <tex-math notation="latex">$k$ </tex-math></inline-formula>-means algorithms accelerated using the triangle inequality |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
The standard formulation of the K -means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake's, Elkan's, Annulus, and Yinyang algorithms. We propose a hybrid MPI/OpenMP parallelization of these algorithms in which the dataset and the corresponding data structures storing bounds on distances are evenly divided among MPI processes. Then, in the assignment step of a K -means iteration, each MPI process computes the assignment of its portion of data using OpenMP threads. In the update step of the iteration, the cluster centroids are computed using a hierarchical all-reduce operation. In the computational experiments, we compared the strong scalability of these four algorithms with the scalability of Lloyd's algorithm, parallelized using the same approach. The results indicate that all four algorithms maintain an advantage in computing time over Lloyd's algorithm. A comparison with two software packages, whose sources are publicly available, in the same computing environment, shows that our implementations are more efficient. |
topic |
Clustering <italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">K</italic>-means triangle inequality MPI OpenMP hybrid parallelization |
url |
https://ieeexplore.ieee.org/document/8681032/ |
work_keys_str_mv |
AT wojciechkwedlo ahybridmpiopenmpparallelizationofinlineformulatexmathnotationlatexktexmathinlineformulameansalgorithmsacceleratedusingthetriangleinequality AT paweljczochanski ahybridmpiopenmpparallelizationofinlineformulatexmathnotationlatexktexmathinlineformulameansalgorithmsacceleratedusingthetriangleinequality AT wojciechkwedlo hybridmpiopenmpparallelizationofinlineformulatexmathnotationlatexktexmathinlineformulameansalgorithmsacceleratedusingthetriangleinequality AT paweljczochanski hybridmpiopenmpparallelizationofinlineformulatexmathnotationlatexktexmathinlineformulameansalgorithmsacceleratedusingthetriangleinequality |
_version_ |
1724190898103779328 |