[Beowulf] How Would You Test Infiniband in New Cluster?
bill at cse.ucdavis.edu
Tue Nov 17 10:33:17 PST 2009
Jon Forrest wrote:
> Let's say you have a brand new cluster with
> brand new Infiniband hardware, and that
> you've installed OFED 1.4 and the
> appropriate drivers for your IB
> HCAs (i.e. you see ib0 devices
> on the frontend and all compute nodes).
> The cluster appears to be working
> fine but you're not sure about IB.
> How would you test your IB network
> to make sure all is well?
My first suggest sanity test would be to test latency and bandwidth to insure
you are getting IB numbers. So 80-100MB/sec and 30-60us for a small packet
would imply GigE. 6-8 times the bandwidth certainly would imply SDR or
better. Latency varies quite a bit among implementation, I'd try to get
within 30-40% of advertised latency numbers.
Then I'd try a workload that kept all nodes busy with something communications
intensive. Pathscale has a mpi_nxnlatbw which works reasonable well to
identify ports/nodes that are are slower than expected.
After that works I'd suggest a production MPI work load with a known answer.
More information about the Beowulf