Topology Aware Algorithm for Two-Phase I/O in Clusters With Tapered Hierarchical Networks
It is important for large scale scientific simulations to read and write data from parallel file system efficiently. To decrease the I/O bottleneck of scientific simulations, many middle-ware solutions had been developed, and the two-phase scheme is one well-known I/O algorithm designed for collecti...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9057562/ |
id |
doaj-8c4759233a1b4d41bfef4e64a2b4cdd6 |
---|---|
record_format |
Article |
spelling |
doaj-8c4759233a1b4d41bfef4e64a2b4cdd62021-03-30T03:16:12ZengIEEEIEEE Access2169-35362020-01-018669176693010.1109/ACCESS.2020.29859289057562Topology Aware Algorithm for Two-Phase I/O in Clusters With Tapered Hierarchical NetworksWeifeng Liu0https://orcid.org/0000-0002-1160-6687Linping Wu1https://orcid.org/0000-0002-0146-2747Xiaowen Xu2https://orcid.org/0000-0001-6032-454XInstitute of Applied Physics and Computational Mathematics, Beijing, ChinaInstitute of Applied Physics and Computational Mathematics, Beijing, ChinaInstitute of Applied Physics and Computational Mathematics, Beijing, ChinaIt is important for large scale scientific simulations to read and write data from parallel file system efficiently. To decrease the I/O bottleneck of scientific simulations, many middle-ware solutions had been developed, and the two-phase scheme is one well-known I/O algorithm designed for collective I/O operations. During two-phase I/O based operations, a subset of processes is selected to aggregate non-contiguous pieces of data in the shuffle phase before doing collective reads/writes in the I/O phase. In the meantime, the tapered hierarchical network has long been proposed in order to decrease procurement and power cost. Higher bandwidth and lower latency can be provided in the low levels of tapered hierarchical network. In this paper, we presented a new implementation of two-phase I/O algorithm which takes into consideration the communication pattern and the topology of tapered hierarchical network when scheduling the inter-process communications during the shuffle phase. We validated the new algorithm on our high performance computers and obtained the experimental data on the I/O kernels of some simulations. A significant improvement of the shuffle phase performance was achieved by our new algorithm when compared with the previous two-phase I/O implementations.https://ieeexplore.ieee.org/document/9057562/MPI-IOtwo-phase I/Ocommunication topologycommunication optimization |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Weifeng Liu Linping Wu Xiaowen Xu |
spellingShingle |
Weifeng Liu Linping Wu Xiaowen Xu Topology Aware Algorithm for Two-Phase I/O in Clusters With Tapered Hierarchical Networks IEEE Access MPI-IO two-phase I/O communication topology communication optimization |
author_facet |
Weifeng Liu Linping Wu Xiaowen Xu |
author_sort |
Weifeng Liu |
title |
Topology Aware Algorithm for Two-Phase I/O in Clusters With Tapered Hierarchical Networks |
title_short |
Topology Aware Algorithm for Two-Phase I/O in Clusters With Tapered Hierarchical Networks |
title_full |
Topology Aware Algorithm for Two-Phase I/O in Clusters With Tapered Hierarchical Networks |
title_fullStr |
Topology Aware Algorithm for Two-Phase I/O in Clusters With Tapered Hierarchical Networks |
title_full_unstemmed |
Topology Aware Algorithm for Two-Phase I/O in Clusters With Tapered Hierarchical Networks |
title_sort |
topology aware algorithm for two-phase i/o in clusters with tapered hierarchical networks |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
It is important for large scale scientific simulations to read and write data from parallel file system efficiently. To decrease the I/O bottleneck of scientific simulations, many middle-ware solutions had been developed, and the two-phase scheme is one well-known I/O algorithm designed for collective I/O operations. During two-phase I/O based operations, a subset of processes is selected to aggregate non-contiguous pieces of data in the shuffle phase before doing collective reads/writes in the I/O phase. In the meantime, the tapered hierarchical network has long been proposed in order to decrease procurement and power cost. Higher bandwidth and lower latency can be provided in the low levels of tapered hierarchical network. In this paper, we presented a new implementation of two-phase I/O algorithm which takes into consideration the communication pattern and the topology of tapered hierarchical network when scheduling the inter-process communications during the shuffle phase. We validated the new algorithm on our high performance computers and obtained the experimental data on the I/O kernels of some simulations. A significant improvement of the shuffle phase performance was achieved by our new algorithm when compared with the previous two-phase I/O implementations. |
topic |
MPI-IO two-phase I/O communication topology communication optimization |
url |
https://ieeexplore.ieee.org/document/9057562/ |
work_keys_str_mv |
AT weifengliu topologyawarealgorithmfortwophaseioinclusterswithtaperedhierarchicalnetworks AT linpingwu topologyawarealgorithmfortwophaseioinclusterswithtaperedhierarchicalnetworks AT xiaowenxu topologyawarealgorithmfortwophaseioinclusterswithtaperedhierarchicalnetworks |
_version_ |
1724183771773665280 |