[Beowulf] the solution for qdel fail.....

Glen Beane beaneg at umcs.maine.edu
Thu Jan 6 16:31:10 PST 2005


a definite solution is to use mpiexec from www.osc.edu/~pw/mpiexec 
instead of mpirun.  This mpiexec is a tm based replacement for mpirun 
(tm is the PBS task-manager protocol).  When tm is used to spawn all the 
processes instead of ssh/rsh PBS is then aware of all the process that 
belong to the job and therefore it will properly kill them all in the 
event of a qdel or hitting the walltime limit.  When you use mpirun PBS 
is only aware of the initial mpirun process since it does not spawn any 
of the other processes.


Glen Beane
Advanced Computing Research Lab
University of Maine

Jerry Xu wrote:
> Hey, Huang:
> 
>   I found one solution that works for me, maybe you can try it and see
> whether it works for you.
> 
> in your pbs script, try to add this "kill -gm 5" syntax between the
> processor number and your program
> 
> like this 
> 
> mpirun -machinefile $PBS_NODEFILE -np $NPROCS --gm-kill 5 myprogram
> 
> it works for me.
> 
> Jerry.
> 
> /**********************************************************
> Hi,
> 
> We have a new system set up. The vendor set up the PBS for us. For
> administration reasons, we created a new queue "dque" (set to default)
> using the "qmgr" command:
> 
> create queue dque queue_type=e
> s q dqueue enabled=true, started=true
> 
> I was able to submit jobs using the "qsub" command to queue "dque".
> However, when I use "qdel" to kill a job, the job disappears from the
> job list shown by "qstat -a", but the executable is still running on
> the compute nodes. Every time I have to login the corresponding the
> compute node and kill the running job.
> 
> I am wondering if I missed something in setting up the queue so that I
> am unable to kill the job completely using "qdel".
> 
> Thanks.
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 



More information about the Beowulf mailing list