Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Egan Ford egan at sense.net
Mon Jan 24 11:35:00 PST 2005


I should have added that xhpl.x86_64 and xhpl.ia64 are native 64 bit
binaries for each platform using the native 64-bit Goto libraries.

> -----Original Message-----
> From: beowulf-bounces at beowulf.org 
> [mailto:beowulf-bounces at beowulf.org] On Behalf Of Egan Ford
> Sent: Monday, January 24, 2005 11:53 AM
> To: 'Sean Dilda'; cflau at clc.cuhk.edu.hk
> Cc: beowulf at beowulf.org
> Subject: RE: [Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster
> 
> 
> I have tried it and it did not work (i.e. i686 + x86_64).  I 
> also did not
> spend a lot of time trying to figure it out.  I know that 
> this method is
> sound, it works great with hybrid ia64 and x86_64 clusters.
> 
> Below is a .pbs script to automate running xhpl with multiple 
> arch.  Each
> xhpl binary must have a .$(uname -m) suffix.  This was done 
> with Myrinet.
> 
> The resulting pgfile will look like this (node14 really has 2 
> procs, but
> since mpirun started from node14 it already has one processor 
> assigned to
> rank 0, so the pgfile only needs to describe the rest of the 
> processors).
> 
> node14 1 /home/egan/bench/hpl/bin/xhpl.x86_64
> node10 2 /home/egan/bench/hpl/bin/xhpl.ia64
> node13 2 /home/egan/bench/hpl/bin/xhpl.x86_64
> node9 2 /home/egan/bench/hpl/bin/xhpl.ia64
> 
> Script:
> 
> #PBS -l nodes=4:compute:ppn=2,walltime=10:00:00 
> #PBS -N xhpl
> 
> # prog name
> PROG=xhpl.$(uname -m)
> PROGARGS=""
> 
> NODES=$PBS_NODEFILE
> 
> # How many proc do I have?
> NP=$(wc -l $NODES | awk '{print $1}')
> 
> # create pgfile with rank 0 node with one less
> # process because it gets one by default
> ME=$(hostname -s)
> N=$(egrep "^$ME\$" $NODES | wc -l | awk '{print $1}')
> N=$(($N - 1))
> if [ "$N" = "0" ]
> then
>         >pgfile
> else
>         echo "$ME $N $PWD/$PROG" >pgfile
> fi
> 
> # add other nodes to pgfile
> for i in $(cat $NODES | egrep -v "^$ME\$" | sort | uniq)
> do
>         N=$(egrep "^$i\$" $NODES | wc -l | awk '{print $1}')
>         ARCH=$(ssh $i uname -m)
>         echo "$i $N $PWD/xhpl.$ARCH"
> done >>pgfile
> 
> # MPICH path
> # mpirun is a script, no worries
> MPICH=/usr/local/mpich/1.2.6..13/gm/x86_64/smp/pgi64/ssh/bin
> PATH=$MPICH/bin:$PATH
> 
> export LD_LIBRARY_PATH=/usr/local/goto/lib
> 
> set -x
> 
> # cd into the directory where I typed qsub
> if [ "$PBS_ENVIRONMENT" = "PBS_INTERACTIVE" ]
> then
>         mpirun.ch_gm \
>                 -v \
>                 -pg pgfile \
>                 --gm-kill 5 \
>                 --gm-no-shmem \
>                 LD_LIBRARY_PATH=/usr/local/goto/lib \
>                 $PROG $PROGARGS
> else
>         cd $PBS_O_WORKDIR
>         cat $PBS_NODEFILE >hpl.$PBS_JOBID
> 
>         mpirun.ch_gm \
>                 -pg pgfile \
>                 --gm-kill 5 \
>                 --gm-no-shmem \
>                 LD_LIBRARY_PATH=/usr/local/goto/lib \
>                 $PROG $PROGARGS >>hpl.$PBS_JOBID
> fi
> 
> exit 0
> 
> > -----Original Message-----
> > From: beowulf-bounces at beowulf.org 
> > [mailto:beowulf-bounces at beowulf.org] On Behalf Of Sean Dilda
> > Sent: Friday, January 21, 2005 7:42 AM
> > To: cflau at clc.cuhk.edu.hk
> > Cc: beowulf at beowulf.org
> > Subject: Re: [Beowulf] MPICH on heterogeneous (i386 + 
> x86_64) cluster
> > 
> > 
> > John Lau wrote:
> > > Hi,
> > > 
> > > Have anyone try running MPI programs with MPICH on 
> > heterogeneous cluster
> > > with both i386 and x86_64 machines? Can I use a i386 binary 
> > on the i386
> > > machines while use a x86_64 binary on the x86_64 machines 
> > for the same
> > > MPI program? I thought they can communicate before but it 
> > seems that I
> > > was wrong because I got error in the testing.
> > > 
> > > Have anyone try that before?
> > 
> > I've not tried it, but I can think of a few good reasons why 
> > you'd want 
> > to avoid it.  Lets say you want to send some data that's 
> stored in a 
> > long from the x86_64 box to the x86 box.  Well, on the 
> x86_64 box, a 
> > long takes up 8 bytes.  But on the x86 box, it only takes 4 
> > bytes.  So, 
> > chances are some Bad Stuff(tm) is going to happen if you try 
> > to span an 
> > MPI program across architectures like that.
> > 
> > On the other hand, the x86_64 box will run x86 code without a 
> > problem. 
> > So i suggest running x86 binaries (and mpich) libraries on 
> all of the 
> > boxes.  While I haven't tested it myself, I can't think of 
> any reason 
> > why that wouldn't work.
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) 
> > visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 




More information about the Beowulf mailing list