[Beowulf] disabling bad nodes

John Bushnell bushnell at chem.ucsb.edu
Mon Mar 27 14:22:42 PST 2006


   You can talk to your admin about becoming a Torque manager which
is configurable (check the docs).  Then you can do something like:
'pbsnodes -o nodexxx'.  This takes node "nodexxx" offline so that
it is not used until its status is cleared with 'pbsnodes -c nodexxx'
(presumably after fixing the node in question).

   Also, you can request nodes by name in the queue submission script
like so:

#PBS -l nodes=node010:ppn=2+node002:ppn=1

This would request two processors on node "node010" and one on "node002".
Cumbersome, but useful in a bind.  I don't know of a way offhand of
requesting any node _except_ a particular node.

      - John

On Sun, 26 Mar 2006, James Rustad wrote:

> Guys
> This is a strange question, but
> Is there any way to disable a bad node in PBS without being the system 
> administrator?
> I am lining up about 50 jobs in the queue and they fail sequentially when 
> they hit
> the bad node.  This often seems to happen on the weekends when nobody
> is around to reboot the node.
>
> Can I specify within PBS "don't use node015" or something like that.
> Thanks
> Jim Rustad
> ps
> I may be using TORQUE rather than PBS, by the way



More information about the Beowulf mailing list