[Beowulf] LAM_MPI problem on PBS
Reuti
reuti at staff.uni-marburg.de
Tue Aug 23 09:39:46 PDT 2005
Hi,
you don't need a hostfile on your own at all, as PBS will select the
nodes for your job. So the question is still: did you compile LAM/MPI to
honor the TM interface of PBS? Please also have a look here:
http://www.lam-mpi.org/faq/category12.php3#question3
Onur Destanog(lu wrote:
> Hi,
>
> this is my PBS script;
> #PBS -N firstscp
> #PBS -l nodes=1:ppn=2
> #PBS -l mem=4mb
> #PBS -l walltime=1:00:00
> #PBS -V
> #PBS -m bea
> PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin
> export PATH
> lamboot -v
> mpirun -v C first
> lamhalt -v
>
> my systems /home directory is nfs shared between all nodes, so there
> is onl one hosts file in user niyazi's home directory, this is the
> hosts file;
>
> node00
> node01
> node02
> node03
> node04
> node05
>
> node00 is not my execution node it only runs pbs_server and pbs_sched.
>
> when i run the script i encounter some problems like these;
>
> one error file;
>
> n-1<2289> ssi:boot:base:linear: booting n0 (localhost)
> n-1<2289> ssi:boot:base:linear: finished
>
> one output file:
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> 2294 first running on n0 (o)
> Hello, I am 0 of the nodes : 1
What are you printing with 0 and 1 here? Rank and total number number of
ranks?
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> Shutting down LAM
> hreq: received HALT_ACK from n0 (bee01.bee-hive)
> LAM halted
>
> so what's is going wrong?
It doesn't look so wrong, but is executed outside of PBS control. The
boot schema started just one daemon it seems and this one is used by
using the option C.
-- Reuti
More information about the Beowulf
mailing list