[Beowulf] MPI performance on clusters of SMP
    Philippe Blaise 
    philippe.blaise at cea.fr
       
    Thu Aug 26 09:18:40 PDT 2004
    
    
  
Hi Igor,
the situation is rather complex. You compare a N nodes x 2 cpus with a 2 
* N nodes x 1 cpu machine,
but you forget the number of network interfaces. In the first case the 2 
cpus share the network interface
and they share the memory too. And of course, in the first case, you 
save money because you have
less network cards to buy... that's why cluster with 2 cpus boxes are so 
common.
And the 2 cpus boxes can be smp (intel) or ccnuma (opteron)
Then, it's difficult to predict if a N nodes x 2 cpus machine 
performance is better than the 2 N * 1 cpu
solution for a given program. The better way is to do some tests !
For example, a MPI_Alltoall communication pattern should be more 
effective on a 2 N * 1 cpu machine,
but it could be the inverse situation for a intensive MPI_Isend / 
MPI_Irecv pattern...
For your tiger box problem, first you should know that the intel chipset 
is not very good,
then are you sure that no other program (like system activity) has 
interfered with your measurments ?
regards,
Philippe Blaise
Kozin, I (Igor) wrote:
>Nowadays clusters are typically built from SMP boxes.
>Dual cpu nodes are common but quad and more available too.
>Nevertheless I never saw that a parallel program runs quicker 
>on N nodes x 2 cpus than on 2*N nodes x 1 cpu
>even if local memory bandwidth requirements are very modest.
>The appearance is such that shared memory communication always
>comes at an extra cost rather than as an advantage although
>both MPICH and LAM-MPI have support for shared memory.
>
>Any comments? Is this MPICH/LAM or Linux issue?
>
>At least in one case I observed a hint towards Linux.
>I run several instances of a small program on a 4-way Itanium2 Tiger box
>with 2.4 kernel. The program is basically 
>a loop over an array which fits into L1 cache.
>Up to 3 instances finish virtually simultaneously.
>If 4 instances are launched then 3 finish first and the 4th later
>the overall time being about 40% longer.
>
>Igor
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
>  
>
    
    
More information about the Beowulf
mailing list