Summary: | We have implemented a system called MPI-NP II, which is an MPI specific
messaging system for the Myrinet System Area Networks (SAN). It consists of a lowlevel
message manager executing on the LANai processor of the Myrinet Network
Interface Card (NIC), a thin host interface layer, and LAM-MPI, a public domain
version of MPI.
MPI-NP II is a re-design of MPI-NP that simplifies and improves the performance
of the original implementation. MPI-NP differs from other low-level messaging
systems in that it off-loads some of the MPI specific communication tasks onto
the network processor. In particular, it manages MPI message envelopes and can
progress messages asynchronously from the host. It realizes three of the goals stated
in the MPI standard, namely; zero-copy messaging, overlap of communication and
computation, and off-loading tasks to a communication co-processor. In addition, it
greatly simplifies and reduces host/NIC interaction and makes it possible to support
broadcasting on the NIC.
The design MPI-NP II introduces the concept of a microchannel, which is
analogous to an independent thread on the NIC whose task is to deliver a specific
message. The message manager allows for multiple outstanding send/receive requests
and guarantees message delivery based on the available envelope resources,
independent of the message size.
We achieve these design goals without unduly burdening the slow network
processor. MPI-NP II has a minimum message latency of 22 microseconds and a
maximum bandwidth of 92MB/s. These values are comparable to other low-level
messaging systems but with the added benefit of being able to overlap communication
and computation.
|