Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server

Optimal partitioning of a square computational domain over several heterogeneous processors, balancing the load of the processors and minimizing the inter-processor communication cost, is crucial for data parallel dense linear algebra and other applications having similar communication pattern on mo...

Full description

Bibliographic Details
Main Authors: Tania Malik, Alexey Lastovetsky
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9328411/
id doaj-dbb08281d65a4f3bacb674e4f7900f5c
record_format Article
spelling doaj-dbb08281d65a4f3bacb674e4f7900f5c2021-03-30T15:25:36ZengIEEEIEEE Access2169-35362021-01-019172291724410.1109/ACCESS.2021.30529769328411Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous ServerTania Malik0https://orcid.org/0000-0002-4461-7120Alexey Lastovetsky1https://orcid.org/0000-0001-9460-3897School of Computer Science, University College Dublin, Dublin 4, IrelandSchool of Computer Science, University College Dublin, Dublin 4, IrelandOptimal partitioning of a square computational domain over several heterogeneous processors, balancing the load of the processors and minimizing the inter-processor communication cost, is crucial for data parallel dense linear algebra and other applications having similar communication pattern on modern hybrid servers. Although a solution has been found for two processors, the cases of three and more processors are still open. The state of-the-art solution for three processors uses an approximation communication cost function which fails to accurately account for the total amount of data moved between processors, leaving thus the question of its global optimality unanswered. In this work, we formulate and solve a mathematical problem of optimal partitioning a real-valued square over three heterogeneous processors using a new cost function, which accurately accounts for the total amount of data communicated between processors. We also develop an original method for accurate experimental evaluation of the communication time of data movement between memories of the compute devices in the hybrid platform during the execution of data parallel applications. We successfully use this method in the experimental validation of our mathematical results. Finally, we propose a communication energy model predicting the dynamic energy consumption of data movement between processors and experimentally validate its accuracy. This model predicts, and the experiments confirm, that the performance-optimal partition is not necessarily energy optimal.https://ieeexplore.ieee.org/document/9328411/Data partitioningcommunication optimizationnon-rectangular partitioningmatrix multiplicationheterogeneous computingperformance model
collection DOAJ
language English
format Article
sources DOAJ
author Tania Malik
Alexey Lastovetsky
spellingShingle Tania Malik
Alexey Lastovetsky
Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server
IEEE Access
Data partitioning
communication optimization
non-rectangular partitioning
matrix multiplication
heterogeneous computing
performance model
author_facet Tania Malik
Alexey Lastovetsky
author_sort Tania Malik
title Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server
title_short Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server
title_full Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server
title_fullStr Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server
title_full_unstemmed Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server
title_sort towards optimal matrix partitioning for data parallel computing on a hybrid heterogeneous server
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2021-01-01
description Optimal partitioning of a square computational domain over several heterogeneous processors, balancing the load of the processors and minimizing the inter-processor communication cost, is crucial for data parallel dense linear algebra and other applications having similar communication pattern on modern hybrid servers. Although a solution has been found for two processors, the cases of three and more processors are still open. The state of-the-art solution for three processors uses an approximation communication cost function which fails to accurately account for the total amount of data moved between processors, leaving thus the question of its global optimality unanswered. In this work, we formulate and solve a mathematical problem of optimal partitioning a real-valued square over three heterogeneous processors using a new cost function, which accurately accounts for the total amount of data communicated between processors. We also develop an original method for accurate experimental evaluation of the communication time of data movement between memories of the compute devices in the hybrid platform during the execution of data parallel applications. We successfully use this method in the experimental validation of our mathematical results. Finally, we propose a communication energy model predicting the dynamic energy consumption of data movement between processors and experimentally validate its accuracy. This model predicts, and the experiments confirm, that the performance-optimal partition is not necessarily energy optimal.
topic Data partitioning
communication optimization
non-rectangular partitioning
matrix multiplication
heterogeneous computing
performance model
url https://ieeexplore.ieee.org/document/9328411/
work_keys_str_mv AT taniamalik towardsoptimalmatrixpartitioningfordataparallelcomputingonahybridheterogeneousserver
AT alexeylastovetsky towardsoptimalmatrixpartitioningfordataparallelcomputingonahybridheterogeneousserver
_version_ 1724179588087545856