Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA

This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth te...

Full description

Bibliographic Details
Main Author: Mittenzwey, Nico
Other Authors: TU Chemnitz, Fakultät für Informatik
Format: Dissertation
Language:English
Published: Universitätsbibliothek Chemnitz 2009
Subjects:
PSM
Online Access:http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901053
http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901053
http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/index.html
http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/diploma_thesis_mittenzwey.pdf
http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/sources.tar.gz
http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/20090105.txt
id ndltd-DRESDEN-oai-qucosa.de-bsz-ch1-200901053
record_format oai_dc
spelling ndltd-DRESDEN-oai-qucosa.de-bsz-ch1-2009010532013-01-07T19:57:47Z Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA Mittenzwey, Nico InfiniBand MPI_Allreduce Netzwerk OFED Open MPI PSM RDMA-CM ddc:004 Hochleistungsrechnen Parallelrechner This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the effective bandwidth for one HCA by up to 30%. Different all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for different scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InfiniBand easily by using the RDMA-CM API. Universitätsbibliothek Chemnitz TU Chemnitz, Fakultät für Informatik Diplom Informatiker Frank Mietke Professor Doktor Wolfgang Rehm 2009-06-30 doc-type:masterThesis text/html application/pdf application/x-gzip text/plain application/zip http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901053 urn:nbn:de:bsz:ch1-200901053 http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/index.html http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/diploma_thesis_mittenzwey.pdf http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/sources.tar.gz http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/20090105.txt eng
collection NDLTD
language English
format Dissertation
sources NDLTD
topic InfiniBand
MPI_Allreduce
Netzwerk
OFED
Open MPI
PSM
RDMA-CM
ddc:004
Hochleistungsrechnen
Parallelrechner
spellingShingle InfiniBand
MPI_Allreduce
Netzwerk
OFED
Open MPI
PSM
RDMA-CM
ddc:004
Hochleistungsrechnen
Parallelrechner
Mittenzwey, Nico
Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA
description This thesis analysed the QLogic InfiniPath QLE7140 HCA and its onload architecture and compared the results to the Mellanox InfiniHost III Lx HCA which uses an offload architecture. As expected, the QLogic InfiniPath QLE7140 HCA can outperform the Mellanox InfiniHost III Lx HCA in latency and bandwidth terms on our test system in various test scenarios. The benchmarks showed, that sending messages with multiple threads in parallel can increase the bandwidth greatly while bi-directional sends cut the effective bandwidth for one HCA by up to 30%. Different all-reduce algorithms where evaluated and compared with the help of the LogGP model. The comparison showed that new all-reduce algorithms can outperform the ones already implemented in Open MPI for different scenarios. The thesis also demonstrated, that one can implement multicast algorithms for InfiniBand easily by using the RDMA-CM API.
author2 TU Chemnitz, Fakultät für Informatik
author_facet TU Chemnitz, Fakultät für Informatik
Mittenzwey, Nico
author Mittenzwey, Nico
author_sort Mittenzwey, Nico
title Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA
title_short Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA
title_full Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA
title_fullStr Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA
title_full_unstemmed Evaluating and Improving the Performance of MPI-Allreduce on QLogic HTX/PCIe InifiniBand HCA
title_sort evaluating and improving the performance of mpi-allreduce on qlogic htx/pcie inifiniband hca
publisher Universitätsbibliothek Chemnitz
publishDate 2009
url http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901053
http://nbn-resolving.de/urn:nbn:de:bsz:ch1-200901053
http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/index.html
http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/diploma_thesis_mittenzwey.pdf
http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/data/sources.tar.gz
http://www.qucosa.de/fileadmin/data/qucosa/documents/5827/20090105.txt
work_keys_str_mv AT mittenzweynico evaluatingandimprovingtheperformanceofmpiallreduceonqlogichtxpcieinifinibandhca
_version_ 1716472398793932800