[Beowulf] Re: SGI to offer Windos on clusters ---> Skew/Jitter paper
Ashley Pittman
ashley at quadrics.com
Mon Jan 22 07:39:29 PST 2007
On Thu, 2007-01-18 at 23:12 -0500, Mark Hahn wrote:
> >> "The Case of the Missing Supercomputing Performance"
> >
> > I wondered if you were talking about that paper but it's from lanl not sandia, it should be essential reading for everyone working with large clusters.
>
> I love this paper. but it's critical to realize that it's all about
> very large, very tightly-coupled, frequent-global-collective-using
> applications. you could easily have a 2k-node cluster (I'd call it large)
> dedicated to 1-to-100-core jobs and gleefully ignore jitter. or be running
> an 8k-core montecarlo that never needs any global synchronization, etc.
>
> I'd actually love to see data on whether jitter affects apps
> other than ah, "stockpile stewardship" ;)
In my experience yes. Clearly some apps are more susceptible than
others. At one extreme even embarrassingly parallel apps can suffer
from noise if the job is only considered complete when the last result
is returned.
Any app that performs synchronisation between nodes (even implicitly
with point-to-point comms) will cause delays caused by noise to
propagate across the cluster and unfortunately because of the way these
delays combine the effect gets quite defined at size.
Consider for example a 64 node cluster with one CPU per node, on this
cluster there is a deamon which wakes up once a minute, spins for a
second and goes back to sleep. Running a single process job you can
expect to see 59/60 seconds elapsed used by the job. You probably don't
worry about this. Now assume that you have a 64 way job which performs
a global barrier every two seconds, now in that two second timeframe
statistically *at least one* node will be affected by noise so the
compute time for the process on that node is two seconds for the
application and one for the deamon. Each timestep now takes three
seconds to achieve two seconds worth of compute time, that's 33% of your
compute time down the drain. In reality the figures I've given here are
pessimistic, Linux doesn't have *that much* jitter so smaller clusters
are by-and-large unaffected however it's a fairly common problem on 1024
+ way clusters.
In answer to a previous post about using extra CPUS/cores to alleviate
this problem it's not a new idea, IIRC PSC were doing this six or seven
years ago, I'd be interested to see if hyperthreading helps the
situation, it's almost always turned of and any cluster over 32 CPU's
but it might be advantageous to enable it and use something like cpusets
to bind the application to real CPU's whilst letting the resource
manager/Ganglia/sendmail twiddle it's thumbs on the other virtual 20%
CPU.
Ashley,
More information about the Beowulf
mailing list