[Beowulf] the solution for qdel fail.....

Angel de Vicente angelv at iac.es
Tue Jan 18 00:14:10 PST 2005


Hi Chris,

Chris Samuel writes:
 > On Tue, 11 Jan 2005 02:49 am, Jerry Xu wrote:
 > 
 > > Hi, William, Thank for your information. Just in case somebody still
 > > need it for openPBS configuration, here is my epilogue file.it shall be
 > > located in $pbshome/mom_priv/ for each node and it need to be set as
 > > executable and owned by root. Some others many have better epilogue
 > > scripts...
 > 
 > Hmm, the only thing that worries me about that is that for those of us with 
 > SMP clusters it is possible for a user to have two different jobs running on 
 > each of the CPUs, so an epilogue script that kills all a users processes on a 
 > node would accidentally kill an innocent job.


We have a SMP cluster, and to avoid the death of innocent processes we use the
script in section "Cleanup of MPICH/PBS jobs" in
http://bellatrix.pcl.ox.ac.uk/%7Eben/pbs/

It doesn't always work, and some jobs are left lingering sometimes, but at least
it doesn't kill innocents (some day I hope I will have the time to look into it
and try to find out why).

Hope it helps. Cheers,
Angel de Vicente
-- 
----------------------------------
http://www.iac.es/galeria/angelv/

PostDoc Software Support
Instituto de Astrofisica de Canarias




More information about the Beowulf mailing list