Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] disabling bad nodes

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

John Bushnell bushnell at chem.ucsb.edu
Mon Mar 27 14:22:42 PST 2006


   You can talk to your admin about becoming a Torque manager which
is configurable (check the docs).  Then you can do something like:
'pbsnodes -o nodexxx'.  This takes node "nodexxx" offline so that
it is not used until its status is cleared with 'pbsnodes -c nodexxx'
(presumably after fixing the node in question).

   Also, you can request nodes by name in the queue submission script
like so:

#PBS -l nodes=node010:ppn=2+node002:ppn=1

This would request two processors on node "node010" and one on "node002".
Cumbersome, but useful in a bind.  I don't know of a way offhand of
requesting any node _except_ a particular node.

      - John

On Sun, 26 Mar 2006, James Rustad wrote:

> Guys
> This is a strange question, but
> Is there any way to disable a bad node in PBS without being the system 
> administrator?
> I am lining up about 50 jobs in the queue and they fail sequentially when 
> they hit
> the bad node.  This often seems to happen on the weekends when nobody
> is around to reboot the node.
>
> Can I specify within PBS "don't use node015" or something like that.
> Thanks
> Jim Rustad
> ps
> I may be using TORQUE rather than PBS, by the way



More information about the Beowulf mailing list