[Beowulf] Larrabee - Mark Hahn's personal attack
diep at xs4all.nl
Fri Jan 27 08:12:35 PST 2012
On Jan 27, 2012, at 4:37 PM, Mark Hahn wrote:
>>>> Larrabee indeed resembles itanium to some extend, but not quite.
>>> wow, that has to be your most loosely-tethered-to-reality statement
>>> it's true that Larrabee and Itanium are very close
>>> in the number of letters in their name.
>> Your personal attack seems to indicate you disagree with my
>> qualification of the entire Larrabee line
>> having any reality sense in the long run.
> not surprisingly, no: I disagree that Larrabee and Itanium resemble
> each other in any but really silly ways.
> Itanium is a custom, VLIW architecture; Larrabee is an on-chip
> cluster of non-VLIW, commodity x86_64 cores.
> none of the distinctive features of Itanium (multi-instruction
> dependency on compile-time scheduling, intended market,
> success limited to predictable, high-bandwidth situations,
> inter-node cache coherency) are anything close to the features of
> (standard x86_64 ISA, no special compiler needed, on-chip message-
> network, suitable for complex/dynamic/unpredictable loads, possibly
> not even
> cache-coherent across one chip.)
> my guess is that you were thinking about how ia64 chips tended to
> run at low clock rates, and thinking about how gpus (probably
> larrabee) also tend to be low-clocked.
And both are seem failures from user viewpoint, maybe not from intels
but from intels aim to replace and/or create a new long lasting
that can even *remotely* compete with other manufacturers,
not to mention far too high pricepoints for such cpu's.
>> Instead of throwing mudd, mind to explain why a Larrabee,
>> an architecture far away from mainstream, makes any chance of
>> competing in HPC
>> with the existing architectural concepts in the long run?
> as far as I know, larrabee will be a mesh of conventional x86_64 cores
> that will run today's x86_64 code. I don't know whether Intel has
> (or even decided) whether the cores will have full or partial cache
> coherency, or whether they'll really be an MPI-like shared-nothing
Assuming you're not completely born stupid, i assume you will realize
that IN ORDER to run
most existing x64 codes, it needs to have cache coherency, and that
it always has been
presented as having exactly that.
Which is one of reasons why the architecture doesn't scale of course.
Well you can forget about them running your x64 fortran codes on it
at any fast speed.
You need to total rewrite your code to be able to use vectors of
and in contradiction to GPU's where you can indirectly with arrays
see each PE or each 'compute core'
(which is 4 PE's of in case of AMD-ATI that can execute 1 double a
Such lookups are a disaster at larrabee - having a cost of 7 cycles
for indirect lookups,
so you really need to use vectors.
Now i bet majority of your oldie x64 code doesn't use such huge vectors,
so to even get some remote performance out of it, a total rewrite of
most code is needed,
if it can work at all.
We can then also see the insight that GPU's are total superior to
larrabee at most terrains and
most importantly at multiplicative codes.
As you might know GPU's are worldchampion in doing multiplications
and CPU's are not.
Multiplication happens to be something that is of major importance
for the majority of HPC codes.
Majority i really mean - approaching 90% at the public supercomputers.
> if you want to compare Larrabee to Fermi or AMD GCN, that might be
> interesting. or to mainstream multicore - like bulldozer, with 32c
> per package vs larrabee with ">=50".
> but not ia64. it's best we all just forget about it.
More information about the Beowulf