[Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster

Egan Ford egan at sense.net
Mon Jan 24 11:35:00 PST 2005


I should have added that xhpl.x86_64 and xhpl.ia64 are native 64 bit
binaries for each platform using the native 64-bit Goto libraries.

> -----Original Message-----
> From: beowulf-bounces at beowulf.org 
> [mailto:beowulf-bounces at beowulf.org] On Behalf Of Egan Ford
> Sent: Monday, January 24, 2005 11:53 AM
> To: 'Sean Dilda'; cflau at clc.cuhk.edu.hk
> Cc: beowulf at beowulf.org
> Subject: RE: [Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster
> 
> 
> I have tried it and it did not work (i.e. i686 + x86_64).  I 
> also did not
> spend a lot of time trying to figure it out.  I know that 
> this method is
> sound, it works great with hybrid ia64 and x86_64 clusters.
> 
> Below is a .pbs script to automate running xhpl with multiple 
> arch.  Each
> xhpl binary must have a .$(uname -m) suffix.  This was done 
> with Myrinet.
> 
> The resulting pgfile will look like this (node14 really has 2 
> procs, but
> since mpirun started from node14 it already has one processor 
> assigned to
> rank 0, so the pgfile only needs to describe the rest of the 
> processors).
> 
> node14 1 /home/egan/bench/hpl/bin/xhpl.x86_64
> node10 2 /home/egan/bench/hpl/bin/xhpl.ia64
> node13 2 /home/egan/bench/hpl/bin/xhpl.x86_64
> node9 2 /home/egan/bench/hpl/bin/xhpl.ia64
> 
> Script:
> 
> #PBS -l nodes=4:compute:ppn=2,walltime=10:00:00 
> #PBS -N xhpl
> 
> # prog name
> PROG=xhpl.$(uname -m)
> PROGARGS=""
> 
> NODES=$PBS_NODEFILE
> 
> # How many proc do I have?
> NP=$(wc -l $NODES | awk '{print $1}')
> 
> # create pgfile with rank 0 node with one less
> # process because it gets one by default
> ME=$(hostname -s)
> N=$(egrep "^$ME\$" $NODES | wc -l | awk '{print $1}')
> N=$(($N - 1))
> if [ "$N" = "0" ]
> then
>         >pgfile
> else
>         echo "$ME $N $PWD/$PROG" >pgfile
> fi
> 
> # add other nodes to pgfile
> for i in $(cat $NODES | egrep -v "^$ME\$" | sort | uniq)
> do
>         N=$(egrep "^$i\$" $NODES | wc -l | awk '{print $1}')
>         ARCH=$(ssh $i uname -m)
>         echo "$i $N $PWD/xhpl.$ARCH"
> done >>pgfile
> 
> # MPICH path
> # mpirun is a script, no worries
> MPICH=/usr/local/mpich/1.2.6..13/gm/x86_64/smp/pgi64/ssh/bin
> PATH=$MPICH/bin:$PATH
> 
> export LD_LIBRARY_PATH=/usr/local/goto/lib
> 
> set -x
> 
> # cd into the directory where I typed qsub
> if [ "$PBS_ENVIRONMENT" = "PBS_INTERACTIVE" ]
> then
>         mpirun.ch_gm \
>                 -v \
>                 -pg pgfile \
>                 --gm-kill 5 \
>                 --gm-no-shmem \
>                 LD_LIBRARY_PATH=/usr/local/goto/lib \
>                 $PROG $PROGARGS
> else
>         cd $PBS_O_WORKDIR
>         cat $PBS_NODEFILE >hpl.$PBS_JOBID
> 
>         mpirun.ch_gm \
>                 -pg pgfile \
>                 --gm-kill 5 \
>                 --gm-no-shmem \
>                 LD_LIBRARY_PATH=/usr/local/goto/lib \
>                 $PROG $PROGARGS >>hpl.$PBS_JOBID
> fi
> 
> exit 0
> 
> > -----Original Message-----
> > From: beowulf-bounces at beowulf.org 
> > [mailto:beowulf-bounces at beowulf.org] On Behalf Of Sean Dilda
> > Sent: Friday, January 21, 2005 7:42 AM
> > To: cflau at clc.cuhk.edu.hk
> > Cc: beowulf at beowulf.org
> > Subject: Re: [Beowulf] MPICH on heterogeneous (i386 + 
> x86_64) cluster
> > 
> > 
> > John Lau wrote:
> > > Hi,
> > > 
> > > Have anyone try running MPI programs with MPICH on 
> > heterogeneous cluster
> > > with both i386 and x86_64 machines? Can I use a i386 binary 
> > on the i386
> > > machines while use a x86_64 binary on the x86_64 machines 
> > for the same
> > > MPI program? I thought they can communicate before but it 
> > seems that I
> > > was wrong because I got error in the testing.
> > > 
> > > Have anyone try that before?
> > 
> > I've not tried it, but I can think of a few good reasons why 
> > you'd want 
> > to avoid it.  Lets say you want to send some data that's 
> stored in a 
> > long from the x86_64 box to the x86 box.  Well, on the 
> x86_64 box, a 
> > long takes up 8 bytes.  But on the x86 box, it only takes 4 
> > bytes.  So, 
> > chances are some Bad Stuff(tm) is going to happen if you try 
> > to span an 
> > MPI program across architectures like that.
> > 
> > On the other hand, the x86_64 box will run x86 code without a 
> > problem. 
> > So i suggest running x86 binaries (and mpich) libraries on 
> all of the 
> > boxes.  While I haven't tested it myself, I can't think of 
> any reason 
> > why that wouldn't work.
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) 
> > visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 




More information about the Beowulf mailing list