[Beowulf] disabling bad nodes
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
John Bushnell bushnell at chem.ucsb.eduMon Mar 27 14:22:42 PST 2006
- Previous message: [Beowulf] disabling bad nodes
- Next message: [Beowulf] disabling bad nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
You can talk to your admin about becoming a Torque manager which
is configurable (check the docs). Then you can do something like:
'pbsnodes -o nodexxx'. This takes node "nodexxx" offline so that
it is not used until its status is cleared with 'pbsnodes -c nodexxx'
(presumably after fixing the node in question).
Also, you can request nodes by name in the queue submission script
like so:
#PBS -l nodes=node010:ppn=2+node002:ppn=1
This would request two processors on node "node010" and one on "node002".
Cumbersome, but useful in a bind. I don't know of a way offhand of
requesting any node _except_ a particular node.
- John
On Sun, 26 Mar 2006, James Rustad wrote:
> Guys
> This is a strange question, but
> Is there any way to disable a bad node in PBS without being the system
> administrator?
> I am lining up about 50 jobs in the queue and they fail sequentially when
> they hit
> the bad node. This often seems to happen on the weekends when nobody
> is around to reboot the node.
>
> Can I specify within PBS "don't use node015" or something like that.
> Thanks
> Jim Rustad
> ps
> I may be using TORQUE rather than PBS, by the way
- Previous message: [Beowulf] disabling bad nodes
- Next message: [Beowulf] disabling bad nodes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
