Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA
This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth te...
Main Author: | |
---|---|
Other Authors: | |
Format: | Dissertation |
Language: | English |
Published: |
Universitätsbibliothek Chemnitz
2009
|
Subjects: | |
Online Access: | http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901053 http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901053 http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/index.html http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/diploma_thesis_mittenzwey.pdf http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/sources.tar.gz http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/20090105.txt |
Summary: | This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture
and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload
architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the
Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in
various test scenarios. The benchmarks showed, that sending messages with multiple
threads in parallel can increase the bandwidth greatly while bi-directional sends cut
the effective bandwidth for one HCA by up to 30%.
Different all-reduce algorithms where evaluated and compared with the help of the
LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for different scenarios.
The thesis also demonstrated, that one can implement multicast algorithms for InfiniBand
easily by using the RDMA-CM API. |
---|