[Beowulf] Opinions of Hyper-threading?
landman at scalableinformatics.com
Tue Feb 26 16:13:32 PST 2008
Mark Hahn wrote:
>>> And today memory access can stall up to hundreds of cycles, so any
>>> processor can hide this latency by switching to another thread.
>> My gosh ... we have re-invented the Tera MTA. ...
> I think the reason we both know what that name means is that they had
> (have?) a nugget of truth. after all, a multiplier unit on a chip
had ... morphed into "The New Cray". Burton Smith is long gone, now at
> doesn't really care on which thread's behalf it's doing work. MTA is
> perhaps a bit far towards the pure gatling-gun approach, but I think we
It was very interesting when I first heard about it at an SC9x
conference. Spoke to Burton for a few minutes on it.
Ooo was a (very weak) version of something like this. SMT is a little
stronger. Register renaming and all those fancy ooo optimizations were
in there to make breaking those dependencies down to enable better IPC
... which is the name of the game in the end anyway ...
You are absolutely right, in that the functional units don't (and
shouldn't) care what thread they are using.
> can all agree that ultimately
> any program is just a big hairy dataflow graph.
I would like to use that as a quote ... :)
>>> But the you have to make sure the processor has enough cache and
>>> memory bandwidth to handle the increased memory traffic (like Sun
>> The problem with many (cores|threads) is that memory bandwidth wall.
>> A fixed size (B) pipe to memory, with N requesters on that pipe ...
> I think that's why almost everyone agrees with the elegance of AMD's
> system architecture - memory attached to and thus scaling with ncpus.
> and yes, there's a lot of work already going on regarding making caches
> more intelligent - predicting the multireference or sharing properties
> of a cache block, for instance, to choose when to move it and between
> which caches in a big system.
I seem to remember hearing about the processors-in-core idea many moons
ago. It seemed hard to program. But compare that to the big honking
pile-o-ram, with many processors, few pipes, and bandwidth limits ...
The AMD model is elegant. As you expand the number of cores you expand
the number of memory connections. This is part of the reason the 2350's
at 2GHz give some 3+ GHz Intel 5472's a run for the money on a number of
real world tests. Sort of like the alpha ... you can make the CPU
"infinitely fast", but there is the little matter of the rest of the
system to worry about.
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf