[Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster
Egan Ford
egan at sense.net
Mon Jan 24 10:52:34 PST 2005
I have tried it and it did not work (i.e. i686 + x86_64). I also did not
spend a lot of time trying to figure it out. I know that this method is
sound, it works great with hybrid ia64 and x86_64 clusters.
Below is a .pbs script to automate running xhpl with multiple arch. Each
xhpl binary must have a .$(uname -m) suffix. This was done with Myrinet.
The resulting pgfile will look like this (node14 really has 2 procs, but
since mpirun started from node14 it already has one processor assigned to
rank 0, so the pgfile only needs to describe the rest of the processors).
node14 1 /home/egan/bench/hpl/bin/xhpl.x86_64
node10 2 /home/egan/bench/hpl/bin/xhpl.ia64
node13 2 /home/egan/bench/hpl/bin/xhpl.x86_64
node9 2 /home/egan/bench/hpl/bin/xhpl.ia64
Script:
#PBS -l nodes=4:compute:ppn=2,walltime=10:00:00
#PBS -N xhpl
# prog name
PROG=xhpl.$(uname -m)
PROGARGS=""
NODES=$PBS_NODEFILE
# How many proc do I have?
NP=$(wc -l $NODES | awk '{print $1}')
# create pgfile with rank 0 node with one less
# process because it gets one by default
ME=$(hostname -s)
N=$(egrep "^$ME\$" $NODES | wc -l | awk '{print $1}')
N=$(($N - 1))
if [ "$N" = "0" ]
then
>pgfile
else
echo "$ME $N $PWD/$PROG" >pgfile
fi
# add other nodes to pgfile
for i in $(cat $NODES | egrep -v "^$ME\$" | sort | uniq)
do
N=$(egrep "^$i\$" $NODES | wc -l | awk '{print $1}')
ARCH=$(ssh $i uname -m)
echo "$i $N $PWD/xhpl.$ARCH"
done >>pgfile
# MPICH path
# mpirun is a script, no worries
MPICH=/usr/local/mpich/1.2.6..13/gm/x86_64/smp/pgi64/ssh/bin
PATH=$MPICH/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/goto/lib
set -x
# cd into the directory where I typed qsub
if [ "$PBS_ENVIRONMENT" = "PBS_INTERACTIVE" ]
then
mpirun.ch_gm \
-v \
-pg pgfile \
--gm-kill 5 \
--gm-no-shmem \
LD_LIBRARY_PATH=/usr/local/goto/lib \
$PROG $PROGARGS
else
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE >hpl.$PBS_JOBID
mpirun.ch_gm \
-pg pgfile \
--gm-kill 5 \
--gm-no-shmem \
LD_LIBRARY_PATH=/usr/local/goto/lib \
$PROG $PROGARGS >>hpl.$PBS_JOBID
fi
exit 0
> -----Original Message-----
> From: beowulf-bounces at beowulf.org
> [mailto:beowulf-bounces at beowulf.org] On Behalf Of Sean Dilda
> Sent: Friday, January 21, 2005 7:42 AM
> To: cflau at clc.cuhk.edu.hk
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster
>
>
> John Lau wrote:
> > Hi,
> >
> > Have anyone try running MPI programs with MPICH on
> heterogeneous cluster
> > with both i386 and x86_64 machines? Can I use a i386 binary
> on the i386
> > machines while use a x86_64 binary on the x86_64 machines
> for the same
> > MPI program? I thought they can communicate before but it
> seems that I
> > was wrong because I got error in the testing.
> >
> > Have anyone try that before?
>
> I've not tried it, but I can think of a few good reasons why
> you'd want
> to avoid it. Lets say you want to send some data that's stored in a
> long from the x86_64 box to the x86 box. Well, on the x86_64 box, a
> long takes up 8 bytes. But on the x86 box, it only takes 4
> bytes. So,
> chances are some Bad Stuff(tm) is going to happen if you try
> to span an
> MPI program across architectures like that.
>
> On the other hand, the x86_64 box will run x86 code without a
> problem.
> So i suggest running x86 binaries (and mpich) libraries on all of the
> boxes. While I haven't tested it myself, I can't think of any reason
> why that wouldn't work.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe)
> visit http://www.beowulf.org/mailman/listinfo/beowulf
>
More information about the Beowulf
mailing list