[Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster

Egan Ford egan at sense.net
Mon Jan 24 10:52:34 PST 2005


I have tried it and it did not work (i.e. i686 + x86_64).  I also did not
spend a lot of time trying to figure it out.  I know that this method is
sound, it works great with hybrid ia64 and x86_64 clusters.

Below is a .pbs script to automate running xhpl with multiple arch.  Each
xhpl binary must have a .$(uname -m) suffix.  This was done with Myrinet.

The resulting pgfile will look like this (node14 really has 2 procs, but
since mpirun started from node14 it already has one processor assigned to
rank 0, so the pgfile only needs to describe the rest of the processors).

node14 1 /home/egan/bench/hpl/bin/xhpl.x86_64
node10 2 /home/egan/bench/hpl/bin/xhpl.ia64
node13 2 /home/egan/bench/hpl/bin/xhpl.x86_64
node9 2 /home/egan/bench/hpl/bin/xhpl.ia64

Script:

#PBS -l nodes=4:compute:ppn=2,walltime=10:00:00 
#PBS -N xhpl

# prog name
PROG=xhpl.$(uname -m)
PROGARGS=""

NODES=$PBS_NODEFILE

# How many proc do I have?
NP=$(wc -l $NODES | awk '{print $1}')

# create pgfile with rank 0 node with one less
# process because it gets one by default
ME=$(hostname -s)
N=$(egrep "^$ME\$" $NODES | wc -l | awk '{print $1}')
N=$(($N - 1))
if [ "$N" = "0" ]
then
        >pgfile
else
        echo "$ME $N $PWD/$PROG" >pgfile
fi

# add other nodes to pgfile
for i in $(cat $NODES | egrep -v "^$ME\$" | sort | uniq)
do
        N=$(egrep "^$i\$" $NODES | wc -l | awk '{print $1}')
        ARCH=$(ssh $i uname -m)
        echo "$i $N $PWD/xhpl.$ARCH"
done >>pgfile

# MPICH path
# mpirun is a script, no worries
MPICH=/usr/local/mpich/1.2.6..13/gm/x86_64/smp/pgi64/ssh/bin
PATH=$MPICH/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/goto/lib

set -x

# cd into the directory where I typed qsub
if [ "$PBS_ENVIRONMENT" = "PBS_INTERACTIVE" ]
then
        mpirun.ch_gm \
                -v \
                -pg pgfile \
                --gm-kill 5 \
                --gm-no-shmem \
                LD_LIBRARY_PATH=/usr/local/goto/lib \
                $PROG $PROGARGS
else
        cd $PBS_O_WORKDIR
        cat $PBS_NODEFILE >hpl.$PBS_JOBID

        mpirun.ch_gm \
                -pg pgfile \
                --gm-kill 5 \
                --gm-no-shmem \
                LD_LIBRARY_PATH=/usr/local/goto/lib \
                $PROG $PROGARGS >>hpl.$PBS_JOBID
fi

exit 0

> -----Original Message-----
> From: beowulf-bounces at beowulf.org 
> [mailto:beowulf-bounces at beowulf.org] On Behalf Of Sean Dilda
> Sent: Friday, January 21, 2005 7:42 AM
> To: cflau at clc.cuhk.edu.hk
> Cc: beowulf at beowulf.org
> Subject: Re: [Beowulf] MPICH on heterogeneous (i386 + x86_64) cluster
> 
> 
> John Lau wrote:
> > Hi,
> > 
> > Have anyone try running MPI programs with MPICH on 
> heterogeneous cluster
> > with both i386 and x86_64 machines? Can I use a i386 binary 
> on the i386
> > machines while use a x86_64 binary on the x86_64 machines 
> for the same
> > MPI program? I thought they can communicate before but it 
> seems that I
> > was wrong because I got error in the testing.
> > 
> > Have anyone try that before?
> 
> I've not tried it, but I can think of a few good reasons why 
> you'd want 
> to avoid it.  Lets say you want to send some data that's stored in a 
> long from the x86_64 box to the x86 box.  Well, on the x86_64 box, a 
> long takes up 8 bytes.  But on the x86 box, it only takes 4 
> bytes.  So, 
> chances are some Bad Stuff(tm) is going to happen if you try 
> to span an 
> MPI program across architectures like that.
> 
> On the other hand, the x86_64 box will run x86 code without a 
> problem. 
> So i suggest running x86 binaries (and mpich) libraries on all of the 
> boxes.  While I haven't tested it myself, I can't think of any reason 
> why that wouldn't work.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) 
> visit http://www.beowulf.org/mailman/listinfo/beowulf
> 




More information about the Beowulf mailing list