[Beowulf] How Would You Test Infiniband in New Cluster?

Jon Forrest jlforrest at berkeley.edu
Tue Nov 17 16:26:29 PST 2009


For what it's worth, I'm using 10 nodes, where each node
has 12 cores. I'm also using Rocks with the Mellonox roll.
My HCA is a Mellanox Technologies MT25204 [InfiniHost III Lx HCA]
(rev 20)

> mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 1
> mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 1024
> mpirun -np <number of nodes> -machinefile <list of nodes> ./relay 8192
> 
> You should see something like:
> c0-8 c0-22
> size=     1,  16384 hops,  2 nodes in   0.75 sec ( 45.97 us/hop)     85 KB/sec
> c0-8 c0-22
> size=  1024,  16384 hops,  2 nodes in   2.00 sec (121.94 us/hop)  32803 KB/sec
> c0-8 c0-22
> size=  8192,  16384 hops,  2 nodes in   6.21 sec (379.05 us/hop)  84421 KB/sec
> 
> So basically on a tiny packet 45us of latency (normal for gigE), and on a
> large package 84MB/sec or so (normal for GigE).
> 
> I'd start with 2 nodes, then if you are happy try it with all nodes.

Since there are 10 nodes, I did the following, with the results shown
(I removed the node names):

$ mpirun -np 10  -machinefile hosts ./relay 1
size=     1,  16384 hops, 10 nodes in   0.20 sec ( 12.44 us/hop)    314 
KB/sec
$ mpirun -np 10  -machinefile hosts ./relay 1024
size=  1024,  16384 hops, 10 nodes in   0.33 sec ( 20.40 us/hop) 196074 
KB/sec
$ mpirun -np 10  -machinefile hosts ./relay 8192
size=  8192,  16384 hops, 10 nodes in   0.97 sec ( 59.51 us/hop) 537734 
KB/sec

I believe these are with IB.

> So once you get what you expect I'd suggest something a bit more
> comprehensive.  Something like:
> mpirun -np <number of nodes> -machinefile <list of nodes> ./mpi_nxnlatbw
> 
> I'd expect some different in latency and bandwidth between nodes, but not any
> big differences.  Something like:
> [0<->1]		1.85us		1398.825264 (MillionBytes/sec)

I did the following, with the results shown:

$ mpirun -np 2  -machinefile hosts ./mpi_nxnlatbw
[0<->1]         3.67us          1289.409397 (MillionBytes/sec)
[1<->0]         3.67us          1276.377689 (MillionBytes/sec)

I also ran this with more nodes but the point-to-point
times were about the same.

Does this look right? Based on your numbers, it looks like my
IB is slower than yours. Because of the strange way the OFED
was installed, I can't easily run over just ethernet.

Thanks for your help


-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforrest at berkeley.edu



More information about the Beowulf mailing list