[Beowulf] Network benchmark - MPI

Fri Jul 2 03:33:37 PDT 2004

On Thu, 1 Jul 2004, Jonathan Michael Nowacki wrote:

> Just wondering, if their is a benchmark or program that I can use to test
> the maximum bandwidth of a cluster from node to node, or all nodes
> communicating to the master node.  I'm thinking any kind of bandwidth test
> will do, but the purpose is to see how much traffic a 40 node cluster can
> take when running MrBayes (a MPI program).

Wow, a tough one.  There are several ways of doing it, but most of them
would take a fair bit of work on your part.  The final answer would also
depend >>strongly<< on the organization of MrBayes itself, so the
standard answer is to use MrBayes to do the benchmark.  You >>might<< be
successful in making lots of bisection bandwidth measurements according
to some communications pattern(s) through your switch and with your
nodes and be able to predict MrBayes' performance, if you are really
good and work very hard.  However, it is pretty easy to measure MrBayes'
performance -- date;/usr/bin/time MrBayes [testproblem args];date will
give you a decent system time/wall clock time measure.

To get basic microscopic rates, look for netpipe.  Here is a paper on
its design:

  http://www.scl.ameslab.gov/netpipe/paper/full.html

and here is the main project page:

  http://www.scl.ameslab.gov/Projects/NetPIPE/

netpipe is probably the "best" extant network performance
microbenchmark, although there are useful components in lmbench as well,
and there is an older benchmark called netperf that was very good in its
day until it was abandoned, unloved, on the side of the open source
road. Netpipe has the advantage of being directly integrated with MPI
and PVM and giving transparent and direct tests of a variety of native
drivers so you can test e.g. myrinet or infiniband "directly" and not
just in a communications library context.  And of course you can
directly test TCP.

So if you want to go the microscopic route (which is kind of fun, so why
not?) get netpipe and measure your point-to-point bandwidth and latency
under a variety of conditions.  This still won't give you a really
accurate idea of either master/slave performance or fully distributed
node-to-node performance, because that depends VERY strongly on program
organization and synchronization, but it does give you the basic
building blocks in terms of which to understand it.

  HTH,

    rgb

> 
> thanks in advance,
>    Jon
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu