[Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster
Egan Ford
egan at sense.net
Mon Jan 24 11:35:00 PST 2005
I should have added that xhpl.x86_64 and xhpl.ia64 are native 64 bit
binaries for each platform using the native 64-bit Goto libraries.
> -----Original Message-----
> From: beowulf-bounces at beowulf.org
> [mailto:beowulf-bounces at beowulf.org] On Behalf Of Egan Ford
> Sent: Monday, January 24, 2005 11:53 AM
> To: 'Sean Dilda'; cflau at clc.cuhk.edu.hk
> Cc: beowulf at beowulf.org
> Subject: RE: [Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster
>
>
> I have tried it and it did not work (i.e. i686 + x86_64). I
> also did not
> spend a lot of time trying to figure it out. I know that
> this method is
> sound, it works great with hybrid ia64 and x86_64 clusters.
>
> Below is a .pbs script to automate running xhpl with multiple
> arch. Each
> xhpl binary must have a .$(uname -m) suffix. This was done
> with Myrinet.
>
> The resulting pgfile will look like this (node14 really has 2
> procs, but
> since mpirun started from node14 it already has one processor
> assigned to
> rank 0, so the pgfile only needs to describe the rest of the
> processors).
>
> node14 1 /home/egan/bench/hpl/bin/xhpl.x86_64
> node10 2 /home/egan/bench/hpl/bin/xhpl.ia64
> node13 2 /home/egan/bench/hpl/bin/xhpl.x86_64
> node9 2 /home/egan/bench/hpl/bin/xhpl.ia64
>
> Script:
>
> #PBS -l nodes=4:compute:ppn=2,walltime=10:00:00
> #PBS -N xhpl
>
> # prog name
> PROG=xhpl.$(uname -m)
> PROGARGS=""
>
> NODES=$PBS_NODEFILE
>
> # How many proc do I have?
> NP=$(wc -l $NODES | awk '{print $1}')
>
> # create pgfile with rank 0 node with one less
> # process because it gets one by default
> ME=$(hostname -s)
> N=$(egrep "^$ME\$" $NODES | wc -l | awk '{print $1}')
> N=$(($N - 1))
> if [ "$N" = "0" ]
> then
> >pgfile
> else
> echo "$ME $N $PWD/$PROG" >pgfile
> fi
>
> # add other nodes to pgfile
> for i in $(cat $NODES | egrep -v "^$ME\$" | sort | uniq)
> do
> N=$(egrep "^$i\$" $NODES | wc -l | awk '{print $1}')
> ARCH=$(ssh $i uname -m)
> echo "$i $N $PWD/xhpl.$ARCH"
> done >>pgfile
>
> # MPICH path
> # mpirun is a script, no worries
> MPICH=/usr/local/mpich/1.2.6..13/gm/x86_64/smp/pgi64/ssh/bin
> PATH=$MPICH/bin:$PATH
>
> export LD_LIBRARY_PATH=/usr/local/goto/lib
>
> set -x
>
> # cd into the directory where I typed qsub
> if [ "$PBS_ENVIRONMENT" = "PBS_INTERACTIVE" ]
> then
> mpirun.ch_gm \
> -v \
> -pg pgfile \
> --gm-kill 5 \
> --gm-no-shmem \
> LD_LIBRARY_PATH=/usr/local/goto/lib \
> $PROG $PROGARGS
> else
> cd $PBS_O_WORKDIR
> cat $PBS_NODEFILE >hpl.$PBS_JOBID
>
> mpirun.ch_gm \
> -pg pgfile \
> --gm-kill 5 \
> --gm-no-shmem \
> LD_LIBRARY_PATH=/usr/local/goto/lib \
> $PROG $PROGARGS >>hpl.$PBS_JOBID
> fi
>
> exit 0
>
> > -----Original Message-----
> > From: beowulf-bounces at beowulf.org
> > [mailto:beowulf-bounces at beowulf.org] On Behalf Of Sean Dilda
> > Sent: Friday, January 21, 2005 7:42 AM
> > To: cflau at clc.cuhk.edu.hk
> > Cc: beowulf at beowulf.org
> > Subject: Re: [Beowulf] MPICH on heterogeneous (i386 +
> x86_64) cluster
> >
> >
> > John Lau wrote:
> > > Hi,
> > >
> > > Have anyone try running MPI programs with MPICH on
> > heterogeneous cluster
> > > with both i386 and x86_64 machines? Can I use a i386 binary
> > on the i386
> > > machines while use a x86_64 binary on the x86_64 machines
> > for the same
> > > MPI program? I thought they can communicate before but it
> > seems that I
> > > was wrong because I got error in the testing.
> > >
> > > Have anyone try that before?
> >
> > I've not tried it, but I can think of a few good reasons why
> > you'd want
> > to avoid it. Lets say you want to send some data that's
> stored in a
> > long from the x86_64 box to the x86 box. Well, on the
> x86_64 box, a
> > long takes up 8 bytes. But on the x86 box, it only takes 4
> > bytes. So,
> > chances are some Bad Stuff(tm) is going to happen if you try
> > to span an
> > MPI program across architectures like that.
> >
> > On the other hand, the x86_64 box will run x86 code without a
> > problem.
> > So i suggest running x86 binaries (and mpich) libraries on
> all of the
> > boxes. While I haven't tested it myself, I can't think of
> any reason
> > why that wouldn't work.
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe)
> > visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe)
> visit http://www.beowulf.org/mailman/listinfo/beowulf
>
More information about the Beowulf
mailing list