[Beowulf] disabling bad nodes
bushnell at chem.ucsb.edu
Mon Mar 27 14:22:42 PST 2006
You can talk to your admin about becoming a Torque manager which
is configurable (check the docs). Then you can do something like:
'pbsnodes -o nodexxx'. This takes node "nodexxx" offline so that
it is not used until its status is cleared with 'pbsnodes -c nodexxx'
(presumably after fixing the node in question).
Also, you can request nodes by name in the queue submission script
#PBS -l nodes=node010:ppn=2+node002:ppn=1
This would request two processors on node "node010" and one on "node002".
Cumbersome, but useful in a bind. I don't know of a way offhand of
requesting any node _except_ a particular node.
On Sun, 26 Mar 2006, James Rustad wrote:
> This is a strange question, but
> Is there any way to disable a bad node in PBS without being the system
> I am lining up about 50 jobs in the queue and they fail sequentially when
> they hit
> the bad node. This often seems to happen on the weekends when nobody
> is around to reboot the node.
> Can I specify within PBS "don't use node015" or something like that.
> Jim Rustad
> I may be using TORQUE rather than PBS, by the way
More information about the Beowulf