Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script.

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Reuti reuti at Staff.Uni-Marburg.DE
Tue Aug 31 08:58:37 PDT 2010


Am 31.08.2010 um 16:51 schrieb Rahul Nabar:

> My scheduler, Torque flags compute-nodes as "busy" when the load gets
> above a threshold "ideal load". My settings on 8-core compute nodes
> have this ideal_load set to 8 but I am wondering if this is
> appropriate or not?
> 
> $max_load 9.0
> $ideal_load 8.0
> 
> I do understand the"ideal load = # of cores" heuristic but in at least

Yep.


> 30% of our jobs ( if not more ) I find the load average greater than
> 8. Sometimes even in the 9-10 range. But does this mean there is
> something wrong or do I take this to be the "happy" scenario for HPC:
> i.e. not only are all CPU's busy but the pipeline of processes waiting
> for their CPU slice is also relatively full. After all, a
> "under-loaded" HPC node is a waste of an expensive resource?

With recent kernels also (kernel) processes in D state count as running. Hence the load appears higher than the running processes would imply when only these are added up.

-- Reuti


> On the other hand, if there truly were something wrong with a node[*]
> and I was to use a high load avearage  as one of the signs of
> impending trouble what would be a good threshold? Above what
> load-average on a compute node do people get actually worried? It
> makes sense to set PBS's default "busy" warning to that limit instead
> of just "8".
> 
> I'm ignoring the 5/10/15 min load average distinction. I'm assuming
> Torque is using the most appropriate one!
> 
> *e.g. runaway process, infinite loop in user code, multiple jobs
> accidentally assigned to some node etc.
> 
> -- 
> Rahul
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list