Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] the solution for qdel fail.....

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Jerry Xu jerry at oban.biosc.lsu.edu
Mon Jan 10 07:49:12 PST 2005


Hi, William, Thank for your information. Just in case somebody still
need it for openPBS configuration, here is my epilogue file.it shall be
located in $pbshome/mom_priv/ for each node and it need to be set as
executable and owned by root. Some others many have better epilogue
scripts...


/*****************************************************/
echo '------------clean up------------'
echo running pbs epilogue script
                                                                                
# set key variables
USER=$2
NODEFILE=/var/spool/pbs/aux/$1
                                                                                
echo
echo killing processes of user $USER on the batch nodes
for node in `cat $NODEFILE`
do
       echo Doing node $node
       su $USER -c "ssh $node skill -KILL -u $USER"
done
echo Done

/****************************************************/




On Thu, 2005-01-06 at 17:56, William Scullin wrote:
> Howdy,
> 
> 	The --gm-kill is specific to clusters using myrinet and mostly is there
> to ensure that slave processes using myrinet's mpi hang up when the
> master process is done running. The number after the --gm-kill is the
> timeout in seconds.
> 
> 	I am not sure which version, type, or member of the PBS family you are
> using. If you are using PBS Pro (also probably true for torque and Open
> PBS), you should be able to place two scripts in
> /var/spool/PBS/mom_priv/ called prologue and epilogue on every compute
> node. They must be owned by root and be executable / readable / writable
> only by root. The prologue script will run before every job and the
> epilogue script will run after every job. In the epilogue and prologue
> scripts we use, we clean the nodes of all lingering user processes and
> do some basic checking of node health.
> 
> 	Even if an epilogue script misses a process – or a user a user launches
> a process outside of the queuing system – the prologue will still catch
> it before the next job starts to run.
> 
> 	Best,
> 	William
>  
> On Thu, 2005-01-06 at 14:33, Jerry Xu wrote:
> > Hey, Huang:
> > 
> >   I found one solution that works for me, maybe you can try it and see
> > whether it works for you.
> > 
> > in your pbs script, try to add this "kill -gm 5" syntax between the
> > processor number and your program
> > 
> > like this 
> > 
> > mpirun -machinefile $PBS_NODEFILE -np $NPROCS --gm-kill 5 myprogram
> > 
> > it works for me.
> > 
> > Jerry.
> > 
> > /**********************************************************
> > Hi,
> > 
> > We have a new system set up. The vendor set up the PBS for us. For
> > administration reasons, we created a new queue "dque" (set to default)
> > using the "qmgr" command:
> > 
> > create queue dque queue_type=e
> > s q dqueue enabled=true, started=true
> > 
> > I was able to submit jobs using the "qsub" command to queue "dque".
> > However, when I use "qdel" to kill a job, the job disappears from the
> > job list shown by "qstat -a", but the executable is still running on
> > the compute nodes. Every time I have to login the corresponding the
> > compute node and kill the running job.
> > 
> > I am wondering if I missed something in setting up the queue so that I
> > am unable to kill the job completely using "qdel".
> > 
> > Thanks.
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> William Scullin
> System Administrator
> Center for Computation and Technology
> 342 Johnston Hall
> Louisiana State University
> Baton Rouge, Louisiana 70803
> voice:	225 578 6888
> fax:	225 578 5362
> aim:	WilliamAtLSU
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




More information about the Beowulf mailing list