HZ and HPC

Ken Chase math at velocet.ca
Wed Jan 29 05:55:11 PST 2003


On Tue, Jan 28, 2003 at 10:33:54PM -0800, Greg Lindahl's all...
> On Tue, Jan 28, 2003 at 08:24:28PM -0500, Jean-Christophe Ducom wrote:
> 
> > 	I dont' know if this topic has been discussed previously. I was 
> > wondering how good/bad is the new value for timeslices HZ=512 for RH8.0 
> > (other distributions?) and HZ=1000 for kernel2.5.
> 
> I've lived on Alphas for ages, and it seemed to work OK.
> 
> > I understand how good it is for desktop responsiveness but won't it
> > be bad if two 'intensive' jobs are running causing a lot of cache
> > misses at every timeslice,
> 
> Most HPC clusters only run 1 intensive process per cpu. Yes, loading
> the entire cache from a cold start takes around 1 millisecond.
> Fortunately, not that many jobs are sensitive to the exact cache size
> -- they tend to need more or less. If less, it won't thrash. If more,
> it always thrashes.

With large caches today you can tune a class of jobs for your cluster
that are below the threshhold size nicely, and run those jobs differently
from the others (using different #s of nodes, possibly even different
interconnects as thrashing will obviously destroy scaling).

Furthermore, if you can get your HZ >> 1/(cache ctxtsw time) then you
can mitigate the impact of thrashing.

Someone mentioned that idle time priority stuff a la freeBSD has been
possible for some time now on linux, but I have had little luck finding
out how to do it properly.

With FreeBSD I have observed gaussian 98 running 100.00% cpu for over 10
seconds at a time, while another job sits fully idle. Any thrashing is
minimized. But thats particular to this case because g98 will spend a long
time not touching anything but the cpu (up to an hour or more at times I
think, but its hard to verify) without relinquishing any cpu to idle
prioritied jobs (when waiting on disk or other blocking resources).

If anyone can point out to me how to achieve something like HZ ~ 10 such that
cache thrashing is a non issue, I'd love to hear - need some help. I'd think
that HZ ~ 10 would lead to some very interesting interupt handling problems
for devices with little or no buffer, among other things. Defintely a non
trivial kernel design problem -- need to avoid blocking on priority
inversion among other things.

Thanks.

/kc


> 
> -- greg
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 



More information about the Beowulf mailing list