[Beowulf] The Walmart Compute Node?

Fri Nov 9 10:20:48 PST 2007

Quoting Larry Stewart <larry.stewart at sicortex.com>, on Fri 09 Nov 2007  
06:42:54 AM PST:

> Robert G. Brown wrote:
>
>> On Thu, 8 Nov 2007, Jim Lux wrote:
>>
>>> In general, a N GHz processor will be poorer in a flops/Watt sense  
>>>  than a 2N GHz processor.
>>
> Well that just isn't so.  It seems pretty clear from IBMs BlueGene/L,
> as well as the SiCortex processors, that the
> opposite is true.  The new Green 500 list is brand new, and there's not
> much on it yet, but the BG/L is delivering 190MF/Watt
> on HPL, whereas the machines made out of Intel and AMD chips are half
> that at best.

perhaps I should qualify it a bit and say that a N GHz processor (made  
on a process where that is the top speed) will be poorer than a 2N GHz  
(made on a process where *that* is the top speed)...

If you make the two processors using the same fab process,running at  
the same voltages, with the same functionality, then, yes, the N GHz  
will be basically 50% of the 2N GHz chip.  But that's more of a  
"running the 2N GHz processor at N GHz" sort of situation, isn't it.

And, of course, if you change the architecture, all comparative bets  
are off, because a GHz doesn't compare to a FLOP, which isn't a  
standardized measure by any means.

>
>>>
>>> The power draw is a combination of a fixed load plus a frequency   
>>> dependent load, so for the SAME processor, running it at N/2 GHz   
>>> consumes more than 50% of the power of running it at N GHz.
>>
> This probably IS true, but high performance cores have a lot more logic
> in them to try to achieve performance: out of order
> execution, complex branch prediction, register renaming, etc. etc.

Which is why I said *same* processor.

>  A
> slower core can be a lot simpler with the same silicon process,
> so a decent lower-clock design will be more power efficient than a fast
> clock design.
>
Indeedy yes... of such design decisions do Intel and Via make low  
power processors,  but one can also argue that the simpler  
architecture then gets fewer ops/clock.

As with any sort of generalized statement, when you start to look at  
the details, it will break down.

I was intending more to illustrate that Generation N+1 processors  
running at higher speeds are probably generally better in a FLOPS/Watt  
basis than Generation N processors running at lower clock speeds.

>>>
>>> If you go to a faster processor design, the frequency dependent   
>>> load gets smaller (smaller feature sizes= smaller capacitance to   
>>> charge and discharge on each transition).  The core voltage is   
>>> also usually smaller on higher speed processors, which also   
>>> reduces the power dissipation (smaller number of joules to change   
>>> the voltage from zero to one or vice versa).  So, in general, a 2N  
>>>  GHz processor consumes less than twice the power of a N GHz   
>>> processor.
>>
> The flaw in this argument is that a slower clock design can use the
> same small transistors and the same current state of the art processes
> and it will use many fewer transistors to get its work done, thus using
> very much less power.

Yes, if you're optimizing for power.  I would venture to say, though,  
that until very recently, processors intended for the consumer desktop  
kind of market have basically been driven to run as fast as the  
underlying process will allow. So, to go faster, feature sizes get  
smaller, reducing the power. That is, they do more work for the same  
energy (which, as rgb noted, seems to be about 100W)

You can, of course, use that same advance in technology to do the same  
work with less energy or a smaller die or both.

>>
>> In ADDITION to this is the fact that the processor has to live in a
>> house of some sort, and the house itself adds per processor overhead.
>> This overhead is significant -- typically a minimum of 10-20 W,
>> sometimes as much as 30-40 (depending on how many disks you have, how
>
> This factor does not scale this way!  With low power processors, you
> can pack them together, without the endless support chips, you
> can use low power inter-chip signalling, you can use high efficiency
> power supplies with their economies of scale.  If you look inside
> a PC there are two blocks doing useful work - memory and CPUs, and a
> whole board full of useless crap.  Look inside a machine designed
> to be a cluster and there should be nothing there but cpus and memory.

But part of the whole Beowulf concept (as opposed to massively  
multiprocessor high performance computing) is that you are building  
things with commodity consumer off the shelf stuff to achieve low  
capital cost.  Consumer CPUs are made in enormous quantities, so a mfr  
can spend a fair amount on design and non-recurring engineering.   
Custom HPC chips don't necessarily have this economy of scale.

Certainly, for some applications, it's worth it. (Priced any  
rad-tolerant space qualified FPGAs recently..)

And, I will rapidly concede that the "cluster computing" world has  
strayed significantly from it's humble consumer commodity origins  
(what with sheetmetal designed for cluster nodes, really high  
performance interconnects, etc.)

Jim Lux
Flight Communications Equipment Section
Jet Propulsion Lab
Pasadena CA 91109