[Beowulf] Re: computing on Altix? (Andrew Piskorski)

Mon Sep 12 13:19:30 PDT 2005

> >Google for "superlinear speedup".  Most likely, as you split up your
> >fixed problem size among more processors, more and more of it fits
> >into the processor cache, where it runs much faster due to fewer main
> >memory accesses.

also google for "strong scaling" and contrast to "weak scaling".

the former assumes a fixed problem size and a range of ncpus;
the latter assumes a fixed problem *per* cpu.  I suspect you'll have 
a hard time showing superlinear speedup under weak scaling ;)

> This cache effect is quite profound on Altix since some of these have 
> something like 9 MB cache per processor. You can see this result on 

that's the irony: the it2 really works well when data is all in-cache,
or can somehow be prefetch+streamed so that cache misses don't happen.
once you start missing, performance becomes unexceptional - you can 
easily see this by looking at SpecFP results.  there, the it2's excellent
scores is mainly due to extremely high results in the 2-3 very smallest
benchmark components.

around here, it's mainly serial monte-carlo jobs that are so small that 
they're always in-cache.  so the "high-end" it2 (and expensive) is best
suited for the lowest-end jobs...