Gigabit Ethernet MPI Performance Tests

We have run some tests on lv3 (GigE interconnect), aiming to replace the supplied SCore MPI, since we require a low-latency MPI, SCore is broken on the quad-core nodes, and we're unable to get support for it.

Address any queries about this to Dave Love.

nVidia NICs

All these tests were run on a pair of Sun x2200M2 Shanghai nodes with the IMB benchmark and Open MPI 1.3.1 on CentOS 5.2, varying the networking. (Thanks to Sun for upgrading the nodes to Shanghai for testing.)

The main aim was to investigate Open-MX as a low-latency transport. Here is a selection of results; see the header of each set for more details. The headline figure is a basic latency using Open-MX of <9µs without a switch—a 6µs improvement on TCP.

You basically double the ping-pong bandwidth by just plugging both the x2200s' nVidia NICs into the switch, but at the time there was a problem with such multi-rail setups (since fixed by changes in Open-MX).

NIC Comparison

The x2200s' nVidia NICs provided the lowest latency. We also looked at the alternative Broadcom NICs in the x2200s as above, and the Intel NICs in our x4100s (Opteron 275s at 2200 MHz). These data are ping-pong results from the Open-MX omx_perf program for ease of testing, not MPI. Interrupt coalescence was turned off on the NICs for lowest latency, and they were connected through a cheap un-managed switch. The latency of the switch was ∼1µs better than the Procurve, but it couldn't use jumbo frames.

NIC

Min. latency (µs)

Max. bandwidth (MB/s)

nVidia MCP55

9.8

119.5

Broadcom BCM5715

14.2

119.4

Intel 82546EB

17.5

118.1

With default coalescence parameters, the Broadcom latency is 35µs under the same conditions.

An additional measurement between Broadcom BCM5704s (on 2400 MHz Opteron 280 Supermicro H8DSL boards) through a Nortel Baystack switch showed a minimum latency of 23.6µs and maximum bandwidth of 118.2 MB/s.

For orientation, our Myrinet 2000 system (lv2) achieves 7.2µs and 230 MB/s. Also, on the same x4100 hardware as above (Intel NICs), but over a Baystack switch, the SCore low-latency MPI gave 16.1µs on PMB, compared with 19.0µs from Open-MX over the Procurve; SCore uses hacked Ethernet drivers and our vendor failed to make it work properly on our Barcelona systems.

LivMPI (last edited 2009-09-23 14:35:08 by DaveLove)

This website maintained by Research Computing Services, University of Manchester