Summary: | It is important for large scale scientific simulations to read and write data from parallel file system efficiently. To decrease the I/O bottleneck of scientific simulations, many middle-ware solutions had been developed, and the two-phase scheme is one well-known I/O algorithm designed for collective I/O operations. During two-phase I/O based operations, a subset of processes is selected to aggregate non-contiguous pieces of data in the shuffle phase before doing collective reads/writes in the I/O phase. In the meantime, the tapered hierarchical network has long been proposed in order to decrease procurement and power cost. Higher bandwidth and lower latency can be provided in the low levels of tapered hierarchical network. In this paper, we presented a new implementation of two-phase I/O algorithm which takes into consideration the communication pattern and the topology of tapered hierarchical network when scheduling the inter-process communications during the shuffle phase. We validated the new algorithm on our high performance computers and obtained the experimental data on the I/O kernels of some simulations. A significant improvement of the shuffle phase performance was achieved by our new algorithm when compared with the previous two-phase I/O implementations.
|