[Beowulf] MPI performance on clusters of SMP

Fri Aug 27 06:25:51 PDT 2004

On Thu, 26 Aug 2004, Kozin, I (Igor) wrote:

> Philippe,
> many thanks for your responce(s).
> 
> I see. So all the cases I've seen must have the network
> bandwidth saturated (i.e. between a node and the switch).
> Should be possible to profile...

There are a number of tools out there that will permit you to monitor
network load, per interface, per node.  xmlsysd/wulfstat for one, but
ganglia, various x apps, and a command line (e.g.) 

  netstat --interface=eth0 5

which is nearly equivalent to:

#!/bin/sh

while [ -1 ]
do
  head -2 /proc/net/dev
  COUNT=10
  while [ $COUNT != 0 ]
  do #
    COUNT=`expr $COUNT - 1`
    grep eth0 /proc/net/dev
    sleep 5
  done
done

The only problem with these last two tools is that they display
absolute packet/byte counts.  It is left as an exercise for the student
to convert this into e.g. perl and add code to extract deltas, divide by
the time, and form a rate.

Or use one of the tools that does it for you, of course...

   rgb

> Thus using both cpus on a node creates even higher load on the 
> connection. Hypothetically, when the memory bandwidth and 
> the switch are not a problem then using N x 2 configuration 
> with 2 network cards per node should be always superior to 
> using 2*N x 1 config with 1 network card per node.
> (same number of cards and cpus!).
> 
> Best,
> Igor
> 
> PS As for my experiment with the Tiger box, it is perfectly 
> reproducible and does not depend on the state of the system.
> I know that the chipset is not perfect and that's why I tried
> to fit everything in to cache.
> 
> > 
> > Hi Igor,
> > 
> > the situation is rather complex. You compare a N nodes x 2 
> > cpus with a 2 
> > * N nodes x 1 cpu machine,
> > but you forget the number of network interfaces. In the first 
> > case the 2 
> > cpus share the network interface
> > and they share the memory too. And of course, in the first case, you 
> > save money because you have
> > less network cards to buy... that's why cluster with 2 cpus 
> > boxes are so 
> > common.
> > And the 2 cpus boxes can be smp (intel) or ccnuma (opteron)
> > Then, it's difficult to predict if a N nodes x 2 cpus machine 
> > performance is better than the 2 N * 1 cpu
> > solution for a given program. The better way is to do some tests !
> > For example, a MPI_Alltoall communication pattern should be more 
> > effective on a 2 N * 1 cpu machine,
> > but it could be the inverse situation for a intensive MPI_Isend / 
> > MPI_Irecv pattern...
> > 
> > For your tiger box problem, first you should know that the 
> > intel chipset 
> > is not very good,
> > then are you sure that no other program (like system activity) has 
> > interfered with your measurments ?
> > 
> > regards,
> > 
> > Philippe Blaise
> > 
> > 
> > Kozin, I (Igor) wrote:
> > 
> > >Nowadays clusters are typically built from SMP boxes.
> > >Dual cpu nodes are common but quad and more available too.
> > >Nevertheless I never saw that a parallel program runs quicker 
> > >on N nodes x 2 cpus than on 2*N nodes x 1 cpu
> > >even if local memory bandwidth requirements are very modest.
> > >The appearance is such that shared memory communication always
> > >comes at an extra cost rather than as an advantage although
> > >both MPICH and LAM-MPI have support for shared memory.
> > >
> > >Any comments? Is this MPICH/LAM or Linux issue?
> > >
> > >At least in one case I observed a hint towards Linux.
> > >I run several instances of a small program on a 4-way 
> > Itanium2 Tiger box
> > >with 2.4 kernel. The program is basically 
> > >a loop over an array which fits into L1 cache.
> > >Up to 3 instances finish virtually simultaneously.
> > >If 4 instances are launched then 3 finish first and the 4th later
> > >the overall time being about 40% longer.
> > >
> > >Igor
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu