[Beowulf] Picking a processor

Mon Dec 13 11:38:38 PST 2004

> I'm currently designing a beowulf style parallel processor and am trying to
> decide which processor to use for the nodes. My project requires my final
> design for the parallel processor to be able to provide a sustained throuput
> of 0.25 TFlops.

by what measure?

> My research tells me that in general that the flop rate scales up linearly.

well, there are factors which can cause sublinearity.

> My trouble is that I'm having trouble finding estimates for the flop rates
> of the processors I'm looking at.

www.top500.org

basically, top500 is a ranking of the fastest 500 computers in the world,
when running a benchmark which is FP-intensive.  it's a real code, but not
a very real real code ;)

the critical numebrs are Rpeak and Rmax:

	Rmax = ncpus*clock*flops-per-cycle

it's the peak theoretical aggregate flops of the machine/cluster.
interestingly, you can get a pretty decent approximation of Rpeak
(the actual HPL score) using:

	Rpeak ~= Rmax * interconnect-efficiency

with:

interconnect    rmax/rpeak
quadrics        .75
myrinet         .7
infiniband      .7
gigabit         .6

this is not too surprising - it would be strange if gigabit were not
less efficient, and quadrics is pretty much the premium interconnect
(unless you count numaflex/etc).  there are undoubtedly other factors
which might be conflated here - for instance, I'd expect HPL scaling
to depend on memory-size-per-cpu as well as memory-bandwidth-per-cpu.
and for a slower interconnect, you can probably get higher efficiency
by maximizing on-node work (minimizing interconnect dependency.)

needless to say, real and useful apps are probably going to achieve 
lower useful flops than HPL.  note also that HPL strongly rewards 
chips which have fused multiply-add, which can be entirely irrelevant
to real codes...

regards, mark hahn.