[Beowulf] the solution for qdel fail.....

Jerry Xu jerry at oban.biosc.lsu.edu
Thu Jan 6 12:33:39 PST 2005


Hey, Huang:

  I found one solution that works for me, maybe you can try it and see
whether it works for you.

in your pbs script, try to add this "kill -gm 5" syntax between the
processor number and your program

like this 

mpirun -machinefile $PBS_NODEFILE -np $NPROCS --gm-kill 5 myprogram

it works for me.

Jerry.

/**********************************************************
Hi,

We have a new system set up. The vendor set up the PBS for us. For
administration reasons, we created a new queue "dque" (set to default)
using the "qmgr" command:

create queue dque queue_type=e
s q dqueue enabled=true, started=true

I was able to submit jobs using the "qsub" command to queue "dque".
However, when I use "qdel" to kill a job, the job disappears from the
job list shown by "qstat -a", but the executable is still running on
the compute nodes. Every time I have to login the corresponding the
compute node and kill the running job.

I am wondering if I missed something in setting up the queue so that I
am unable to kill the job completely using "qdel".

Thanks.




More information about the Beowulf mailing list