Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server
Optimal partitioning of a square computational domain over several heterogeneous processors, balancing the load of the processors and minimizing the inter-processor communication cost, is crucial for data parallel dense linear algebra and other applications having similar communication pattern on mo...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2021-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9328411/ |
id |
doaj-dbb08281d65a4f3bacb674e4f7900f5c |
---|---|
record_format |
Article |
spelling |
doaj-dbb08281d65a4f3bacb674e4f7900f5c2021-03-30T15:25:36ZengIEEEIEEE Access2169-35362021-01-019172291724410.1109/ACCESS.2021.30529769328411Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous ServerTania Malik0https://orcid.org/0000-0002-4461-7120Alexey Lastovetsky1https://orcid.org/0000-0001-9460-3897School of Computer Science, University College Dublin, Dublin 4, IrelandSchool of Computer Science, University College Dublin, Dublin 4, IrelandOptimal partitioning of a square computational domain over several heterogeneous processors, balancing the load of the processors and minimizing the inter-processor communication cost, is crucial for data parallel dense linear algebra and other applications having similar communication pattern on modern hybrid servers. Although a solution has been found for two processors, the cases of three and more processors are still open. The state of-the-art solution for three processors uses an approximation communication cost function which fails to accurately account for the total amount of data moved between processors, leaving thus the question of its global optimality unanswered. In this work, we formulate and solve a mathematical problem of optimal partitioning a real-valued square over three heterogeneous processors using a new cost function, which accurately accounts for the total amount of data communicated between processors. We also develop an original method for accurate experimental evaluation of the communication time of data movement between memories of the compute devices in the hybrid platform during the execution of data parallel applications. We successfully use this method in the experimental validation of our mathematical results. Finally, we propose a communication energy model predicting the dynamic energy consumption of data movement between processors and experimentally validate its accuracy. This model predicts, and the experiments confirm, that the performance-optimal partition is not necessarily energy optimal.https://ieeexplore.ieee.org/document/9328411/Data partitioningcommunication optimizationnon-rectangular partitioningmatrix multiplicationheterogeneous computingperformance model |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Tania Malik Alexey Lastovetsky |
spellingShingle |
Tania Malik Alexey Lastovetsky Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server IEEE Access Data partitioning communication optimization non-rectangular partitioning matrix multiplication heterogeneous computing performance model |
author_facet |
Tania Malik Alexey Lastovetsky |
author_sort |
Tania Malik |
title |
Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server |
title_short |
Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server |
title_full |
Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server |
title_fullStr |
Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server |
title_full_unstemmed |
Towards Optimal Matrix Partitioning for Data Parallel Computing on a Hybrid Heterogeneous Server |
title_sort |
towards optimal matrix partitioning for data parallel computing on a hybrid heterogeneous server |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2021-01-01 |
description |
Optimal partitioning of a square computational domain over several heterogeneous processors, balancing the load of the processors and minimizing the inter-processor communication cost, is crucial for data parallel dense linear algebra and other applications having similar communication pattern on modern hybrid servers. Although a solution has been found for two processors, the cases of three and more processors are still open. The state of-the-art solution for three processors uses an approximation communication cost function which fails to accurately account for the total amount of data moved between processors, leaving thus the question of its global optimality unanswered. In this work, we formulate and solve a mathematical problem of optimal partitioning a real-valued square over three heterogeneous processors using a new cost function, which accurately accounts for the total amount of data communicated between processors. We also develop an original method for accurate experimental evaluation of the communication time of data movement between memories of the compute devices in the hybrid platform during the execution of data parallel applications. We successfully use this method in the experimental validation of our mathematical results. Finally, we propose a communication energy model predicting the dynamic energy consumption of data movement between processors and experimentally validate its accuracy. This model predicts, and the experiments confirm, that the performance-optimal partition is not necessarily energy optimal. |
topic |
Data partitioning communication optimization non-rectangular partitioning matrix multiplication heterogeneous computing performance model |
url |
https://ieeexplore.ieee.org/document/9328411/ |
work_keys_str_mv |
AT taniamalik towardsoptimalmatrixpartitioningfordataparallelcomputingonahybridheterogeneousserver AT alexeylastovetsky towardsoptimalmatrixpartitioningfordataparallelcomputingonahybridheterogeneousserver |
_version_ |
1724179588087545856 |