[Beowulf]

Sat Mar 26 18:31:22 PST 2005

Reuti,

>> I'd suggest to move over to the SGE users list at: 

>> http://gridengine.sunsource.net/servlets/ProjectMailingListList

I have but I do not see my name yet? How long is the verification process?

>> Although there is a special Myrinet directory, you can also try to use 

>> the files in the mpi directory instead.

The mpi directory's mpich.template doesn't use mpirun.ch_gm so how does it
know what version of mpirun to use? If I use the mpi what changes do I have
to make?

>> Can you please give more details of your queue and PE setup (qconf -sq/sp

>> output

SEE BELOW

>> Do you have an admin account for SGE? I'd prefer not to do anything in 

>> SGE as root.

Yes, its grid... SEE BELOW

>> Not really an issue: you have to make a small change to the 

>> mpirun.ch_gm.pl to make all jobs staying in the same process group to get

>> them correctly killed in case of a jobb abort:

I have to double check that in:

http://gridengine.sunsource.net/howto/mpich-integration.html

Here is the new problem I have this situation in the PE:

My jobs won't run when I run my script it goes into pending mode for about
10 sec (status qw), SGE submits to N number of hosts (status t), jobs hangs
in a status t mode, then quickly exit. When I investigated both the
Jobscript_name.{pe|po}JobID output it states that SGE can't make links in
the 

/WEMS/grid/tmp/549.1.Production.q/ directory. 

It looks like the startmpi.sh script links files in $TMPDIR and from my
understanding the value of $TMPDIR 

is derived from the tmpdir parameter in the queue's configuration. I have
designated this attribute as 

'/WEMS/grid/tmp/' but according to the errorlog qsub_wrf.sh.pe549 it is
'/WEMS/grid/tmp/549.1.Production.q/' 

Possibly the source of the problem is here, so what created the
'549.1.Production.q' addendum?  

I then checked the permission of /WEMS/grid/tmp

[wems at wems grid]$ ls -ltr /WEMS/grid | grep tmp

drwxrwxrwx    2 root     root         4096 Mar 26 17:34 tmp

As a sanity check within the startmpi.sh I echo out the ls -ltr of $TMPDIR:

drwxr-xr-x    2 65534    65534        4096 Mar 26  2005 549.1.Production.q

as expected there is no UID/GID that is 65534 in my /etc/passwd. Furthermore
there are only write permission 

for UID/GID 65534 so if it(N1GE) is the only one writing and reading this
directory what else could be 

preventing the writing into that directory? I thought maybe there was a lock
file in /WEMS/grid/tmp so I checked..

[wems at wems tmp]$ ls -al /WEMS/grid/tmp

total 8

drwxrwxrwx    2 root     root         4096 Mar 26 17:34 .

drwxr-xr-x   22 grid     grid         4096 Mar 26 04:20 ..

No Avail, so I am out of solutions. Is this a known issue when using
myrinet, mpich, tight integration or am I overlook something?? I am using
the sge_mpirun script instead of mpirun script. Have you seen any problem
like this before?

I also suspect that the editor may be reading the PE mpich configuration
file's argument start_proc_args incorrectly since the editor wraps the
string of argument around to the next line according to  the
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs file. In CHECK 5 it says
"The mpirun command "\" does not exist"

SEE CHECK 5, CHECK 8, then CHECK 7 BELOW.

Oh yeah, this may be a silly question but where does SGE get $pe_hostfile,
$TMPDIR from and what is the process of how it acquires these variables? I
would like some clarification.

Thanks,

William

Things that I checked

CHECK 0.5

[root at wems wrfprd]# cat qsub_wrf.sh

#!/bin/sh

#$ -S /bin/ksh

#$ -pe mpich 32

#$ -l h_rt=10800

#$ -q Production.q

#

#. /WEMS/wems/external/WRF/wrfsi/etc/setup-mpi.sh

cd /WEMS/wems/data/WRF/wni001a/wrfprd

echo 'This is the job ID '$JOB_ID >
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs

echo 'This is the pe_hostfile '$PE_HOSTFILE >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

echo 'This is the tmpdir '$TMPDIR >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

/WEMS/grid/mpi/myrinet/sge_mpirun
/WEMS/wems/external/WRF/wrfsi/../run/wrf.exe >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs 2>&1

CHECK 1

[wems at wems wems]$ qsub -pe mpich 32 -P test -q Production.q
/WEMS/wems/data/WRF/wni001a/wrfprd/qsub_wrf.sh

CHECK 2

[wems at wems grid]$ cat qsub_wrf.sh.pe549

ln: creating symbolic link `/WEMS/grid/tmp/549.1.Production.q/mpirun.sge' to

`/WEMS/pkgs/mpich-gm-1.2.6.14a/bin/mpirun.ch_gm': Permission denied

/WEMS/grid/mpi/myrinet/startmpi.sh[142]: cannot create
/WEMS/grid/tmp/549.1.Production.q/machines: Permission denied

cat: /WEMS/grid/tmp/549.1.Production.q/machines: No such file or directory

ln: creating symbolic link `/WEMS/grid/tmp/549.1.Production.q/rsh' to
`/WEMS/grid/mpi/rsh': Permission denied

CHECK 3

[wems at wems grid]$ cat qsub_wrf.sh.po549

-catch_rsh /WEMS/grid/wems-hosts2
/WEMS/pkgs/mpich-gm-1.2.6.14a/bin/mpirun.ch_gm

this is the value of mpirun  /WEMS/pkgs/mpich-gm-1.2.6.14a/bin/mpirun.ch_gm

I am doing a ls -ltr on $TMPDIR

total 4

drwxr-xr-x    2 65534    65534        4096 Mar 26  2005 549.1.Production.q

Machine file is /WEMS/grid/tmp/549.1.Production.q/machines

CHECK 4

[wems at wems grid]$ cat Queue-config

qname                 Production.q

hostlist              @Parallel

seq_no                0

load_thresholds       np_load_avg=1.75

suspend_thresholds    NONE

nsuspend              1

suspend_interval      00:05:00

priority              0

min_cpu_interval      00:05:00

processors            2

qtype                 BATCH

ckpt_list             NONE

pe_list               mpich

rerun                 FALSE

slots                 2

tmpdir                /WEMS/grid/tmp

shell                 /bin/ksh

prolog                NONE

epilog                NONE

shell_start_mode      posix_compliant

starter_method        NONE

suspend_method        NONE

resume_method         NONE

terminate_method      NONE

notify                00:00:60

owner_list            NONE

user_lists            Test_A

xuser_lists           NONE

subordinate_list      NONE

complex_values        NONE

projects              test

xprojects             NONE

calendar              NONE

initial_state         default

s_rt                  INFINITY

h_rt                  INFINITY

s_cpu                 INFINITY

h_cpu                 INFINITY

s_fsize               INFINITY

h_fsize               INFINITY

s_data                INFINITY

h_data                INFINITY

s_stack               INFINITY

h_stack               INFINITY

s_core                INFINITY

h_core                INFINITY

s_rss                 INFINITY

h_rss                 INFINITY

s_vmem                INFINITY

h_vmem                INFINITY

CHECK 5

[wems at wems grid]$ cat mpich-PE-config

pe_name           mpich

slots             78

user_lists        Test_A

xuser_lists       NONE

start_proc_args   /WEMS/grid/mpi/myrinet/startmpi.sh -catch_rsh  \

                  /WEMS/grid/wems-hosts2  \

                  /WEMS/pkgs/mpich-gm-1.2.6.14a/bin/mpirun.ch_gm

stop_proc_args    /WEMS/grid/mpi/myrinet/stopmpi.sh

allocation_rule   $fill_up

control_slaves    TRUE

job_is_first_task FALSE

urgency_slots     min

CHECK 6

[wems at wems wems]# cat /WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

This is the pe_hostfile
/WEMS/grid/default/spool/wems18/active_jobs/388.1/pe_hostfile

This is the tmpdir /WEMS/grid/tmp/388.1.Production.q

This is the pe_hostfile
/WEMS/grid/default/spool/wems07/active_jobs/389.1/pe_hostfile

This is the tmpdir /WEMS/grid/tmp//389.1.Production.q

This is the pe_hostfile
/WEMS/grid/default/spool/wems24/active_jobs/390.1/pe_hostfile

This is the tmpdir /WEMS/grid/tmp/398.1.Production.q

This is the pe_hostfile
/WEMS/grid/default/spool/wems22/active_jobs/549.1/pe_hostfile

This is the tmpdir /WEMS/grid/tmp/549.1.Production.q

This is the pe_hostfile

This is the tmpdir

CHECK 7

[wems at wems wems]$ cat /WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs

This is the job ID 549

The mpirun command "\" does not exist

There must be a problem with the mpich parallel environment

CHECK 8

[root at wems wrfprd]# cat qsub_wrf.sh

#!/bin/sh

#$ -S /bin/ksh

#$ -pe mpich 32

#$ -l h_rt=10800

#$ -q Production.q

#

#. /WEMS/wems/external/WRF/wrfsi/etc/setup-mpi.sh

cd /WEMS/wems/data/WRF/wni001a/wrfprd

echo 'This is the job ID '$JOB_ID >
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs

echo 'This is the pe_hostfile '$PE_HOSTFILE >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

echo 'This is the tmpdir '$TMPDIR >>
/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.ps

/WEMS/grid/mpi/myrinet/sge_mpirun
/WEMS/wems/external/WRF/wrfsi/../run/wrf.exe >> 

/WEMS/wems/data/WRF/wni001a/log/050811200.wrf.pbs 2>&1

exit

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Wednesday, March 23, 2005 6:26 PM
To: William Burke
Cc: beowulf at beowulf.org
Subject: Re: [Beowulf]

Hi,

I'd suggest to move over to the SGE users list at: 

http://gridengine.sunsource.net/servlets/ProjectMailingListList

But anyway, let's sort the things out:

Quoting William Burke <wburke999 at msn.com>:

> I can't get PE to work on a 50 node class II Beowulf. It has a front-end

> Sunfire v40 (qmaster host) and 49 Sunfire v20s (execution hosts) running

> Linux configured to communicate data over Myrinet using MPICH-GM version

> 1.26.14a. 

Although there is a special Myrinet directory, you can also try to use the 

files in the mpi directory instead.

> These are the requirements of the N1GE environment to handle: 

> 

> 1.  Serial type jobs for pre-processing the data - average runtime 15

> minutes. 

> 2.  Output is pipelined into parallel processing jobs - range of runtime

> 1- 6 hours. 

> 3.  Concurrently running is post-processing serial jobs. 

> 

> I have setup a Parallel Environment called mpich-gm and a straight-forward

> FIFO scheduling schema for testing. When I submit parallel jobs they hang

> in

> limbo in a 'qw' state pending submission. I am not sure why the scheduler

> does not see jobs that I submit.  

> 

>  

> 

> I used the myrinet mpich template located $SGE_ROOT/< sge_cell

> >/mpi/myrinet

> directory to configure the pe (parallel environment) plus I copied the

> sge_mpirun script to the $SGE_ROOT/< sge_cell >/bin directory.  I

> configured

> a Production.q queue that runs only parallel jobs. As a last sanity check
I

> ran a trace on the scheduler, submitted a simple parallel job, and this is

> the results that I got from the logs:

Can you please give more details of your queue and PE setup (qconf -sq/sp 

output).

> JOB RUN Window

> 

> [wems at wems examples]$ qsub -now y -pe mpich-gm 1-4 -b y hello++

> 

> Your job 277 ("hello++") has been submitted.

> 

> Waiting for immediate job to be scheduled.

> 

>  

> 

> Your qsub request could not be scheduled, try again later.

> 

> [wems at wems examples]$ qsub -pe mpich-gm 1-4 -b y hello++

> 

> Your job 278 ("hello++") has been submitted.

> 

> [wems at wems examples]$ qsub -pe mpich-gm 1-4 -b y hello++

> 

> Your job 279 ("hello++") has been submitted.

You can't start a parallel job this way, as there is no mpirun used. When
you 

used your mentioned script, you get the same behavior (and there you used 

mpirun -np $NSLOTS ...)?

> This is the 2nd window SCHEDULER LOG

> 

> [root at wems bin]# qconf -tsm

> 

> [root at wems bin]# qconf -tsm

> 

> [root at wems bin]# cat /WEMS/grid/default/common/schedd_runlog

> 

> Wed Mar 23 06:08:55 2005|-------------START-SCHEDULER-RUN-------------

> 

> Wed Mar 23 06:08:55 2005|queue instance "all.q at wems10.grid.wni.com"
dropped

> because it is temporarily not available

> 

> Wed Mar 23 06:08:55 2005|queue instance "Production.q at wems10.grid.wni.com"

> dropped because it is temporarily not available

> 

> Wed Mar 23 06:08:55 2005|queues dropped because they are temporarily not

> available: all.q at wems10.grid.wni.com Production.q at wems10.grid.wni.com

> 

> Wed Mar 23 06:08:55 2005|no pending jobs to perform scheduling on

> 

> Wed Mar 23 06:08:55 2005|--------------STOP-SCHEDULER-RUN-------------

> 

> Wed Mar 23 06:11:37 2005|-------------START-SCHEDULER-RUN-------------

> 

> Wed Mar 23 06:11:37 2005|queue instance "all.q at wems10.grid.wni.com"
dropped

> because it is temporarily not available

> 

> Wed Mar 23 06:11:37 2005|queue instance "Production.q at wems10.grid.wni.com"

> dropped because it is temporarily not available

> 

> Wed Mar 23 06:11:37 2005|queues dropped because they are temporarily not

> available: all.q at wems10.grid.wni.com Production.q at wems10.grid.wni.com

> 

> Wed Mar 23 06:11:37 2005|no pending jobs to perform scheduling on

> 

> Wed Mar 23 06:11:37 2005|--------------STOP-SCHEDULER-RUN-------------

> 

> [root at wems bin]# qstat

> 

> job-ID prior   name       user         state submit/start at     queue

> slots ja-task-ID

> 

>
----------------------------------------------------------------------------

> -------------------------------------

> 

>     279 0.55500 hello++    wems         qw    03/23/2005 06:11:43

> 1

> 

> [root at wems bin]#

Do you have an admin account for SGE? I'd prefer not to do anything in SGE
as 

root.

> BTW that node wems10.grid.wni.com has connectivity issues and I have not

> removed it from the cluster queue.  

> 

>  

> 

> What causes this type of problem in N1GE to return "no pending jobs to

> perform scheduling on" in the schedd_runlog even though there are
available

> slots ready to take jobs?  

> 

> I had no problem submitting serial jobs, only the parallel jobs resulted
as

> such. Are there N1GE - Myrinet issue that I am not aware of?  FYI the same

> binary (hello++) runs with no problems from the command line.

If you just start hello++, it will not run in parallel I think.

Not really an issue: you have to make a small change to the mpirun.ch_gm.pl
to 

make all jobs staying in the same process group to get them correctly killed
in 

case of a jobb abort:

http://gridengine.sunsource.net/howto/mpich-integration.html

> Since I generally run scripts from qsub instead of binaries I created a

> script to run the mpich executable but that yield the same result.

> 

>  

> 

> I have an additional question regarding setting a queue.conf parameter

> called "subordinate_list". How is it read from the result of qconf -mq

> <queue_name>?

> 

> Example 

> 

>             i.e., subordinate_list     low_pri.q=5,small.q.

The queue "low_pri.q" will be suspended, when 5 or more slots of
"<queue_name>" 

are filled. The "small.q" will be suspened, if all slots of "<queue_name>"
are 

filled.

Cheers - Reuti

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050326/c2adc1f8/attachment.html>