A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality

The standard formulation of the K -means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake's, Elkan's, Annulus,...

Full description

Bibliographic Details
Main Authors: Wojciech Kwedlo, Pawel J. Czochanski
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
MPI
Online Access:https://ieeexplore.ieee.org/document/8681032/
id doaj-43473507a6874376abcb3ba13cc05dea
record_format Article
spelling doaj-43473507a6874376abcb3ba13cc05dea2021-03-29T22:46:10ZengIEEEIEEE Access2169-35362019-01-017422804229710.1109/ACCESS.2019.29078858681032A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle InequalityWojciech Kwedlo0https://orcid.org/0000-0002-5040-2302Pawel J. Czochanski1Faculty of Computer Science, Bialystok University of Technology, Bialystok, PolandFaculty of Computer Science, Bialystok University of Technology, Bialystok, PolandThe standard formulation of the K -means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake's, Elkan's, Annulus, and Yinyang algorithms. We propose a hybrid MPI/OpenMP parallelization of these algorithms in which the dataset and the corresponding data structures storing bounds on distances are evenly divided among MPI processes. Then, in the assignment step of a K -means iteration, each MPI process computes the assignment of its portion of data using OpenMP threads. In the update step of the iteration, the cluster centroids are computed using a hierarchical all-reduce operation. In the computational experiments, we compared the strong scalability of these four algorithms with the scalability of Lloyd's algorithm, parallelized using the same approach. The results indicate that all four algorithms maintain an advantage in computing time over Lloyd's algorithm. A comparison with two software packages, whose sources are publicly available, in the same computing environment, shows that our implementations are more efficient.https://ieeexplore.ieee.org/document/8681032/Clustering<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">K</italic>-meanstriangle inequalityMPIOpenMPhybrid parallelization
collection DOAJ
language English
format Article
sources DOAJ
author Wojciech Kwedlo
Pawel J. Czochanski
spellingShingle Wojciech Kwedlo
Pawel J. Czochanski
A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality
IEEE Access
Clustering
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">K</italic>-means
triangle inequality
MPI
OpenMP
hybrid parallelization
author_facet Wojciech Kwedlo
Pawel J. Czochanski
author_sort Wojciech Kwedlo
title A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality
title_short A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality
title_full A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality
title_fullStr A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality
title_full_unstemmed A Hybrid MPI/OpenMP Parallelization of <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>-Means Algorithms Accelerated Using the Triangle Inequality
title_sort hybrid mpi/openmp parallelization of <inline-formula> <tex-math notation="latex">$k$ </tex-math></inline-formula>-means algorithms accelerated using the triangle inequality
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description The standard formulation of the K -means clustering (Lloyd's method) performs many unnecessary distance calculations. In this paper, we focus on four approaches that use the triangle inequality to avoid unnecessary distance calculations. These approaches are Drake's, Elkan's, Annulus, and Yinyang algorithms. We propose a hybrid MPI/OpenMP parallelization of these algorithms in which the dataset and the corresponding data structures storing bounds on distances are evenly divided among MPI processes. Then, in the assignment step of a K -means iteration, each MPI process computes the assignment of its portion of data using OpenMP threads. In the update step of the iteration, the cluster centroids are computed using a hierarchical all-reduce operation. In the computational experiments, we compared the strong scalability of these four algorithms with the scalability of Lloyd's algorithm, parallelized using the same approach. The results indicate that all four algorithms maintain an advantage in computing time over Lloyd's algorithm. A comparison with two software packages, whose sources are publicly available, in the same computing environment, shows that our implementations are more efficient.
topic Clustering
<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">K</italic>-means
triangle inequality
MPI
OpenMP
hybrid parallelization
url https://ieeexplore.ieee.org/document/8681032/
work_keys_str_mv AT wojciechkwedlo ahybridmpiopenmpparallelizationofinlineformulatexmathnotationlatexktexmathinlineformulameansalgorithmsacceleratedusingthetriangleinequality
AT paweljczochanski ahybridmpiopenmpparallelizationofinlineformulatexmathnotationlatexktexmathinlineformulameansalgorithmsacceleratedusingthetriangleinequality
AT wojciechkwedlo hybridmpiopenmpparallelizationofinlineformulatexmathnotationlatexktexmathinlineformulameansalgorithmsacceleratedusingthetriangleinequality
AT paweljczochanski hybridmpiopenmpparallelizationofinlineformulatexmathnotationlatexktexmathinlineformulameansalgorithmsacceleratedusingthetriangleinequality
_version_ 1724190898103779328