[Beowulf] Opinions of Hyper-threading?
bill at cse.ucdavis.edu
Wed Feb 27 23:30:35 PST 2008
> The problem with many (cores|threads) is that memory bandwidth wall. A
> fixed size (B) pipe to memory, with N requesters on that pipe ...
What wall? Bandwidth is easy, it just costs money, and not much at that.
Want 50GB/sec buy a $170 video card. Want 100GB/sec... buy a better video
card. Want 200GB/sec buy 2. Sure they don't have much memory (512-768MB) and
of course no double (although I'm not sure if the now shipping 9600GT fixed
that). Sure video cards have minimal memory (512-768MB), no double precision
on the normal cards , and are harder to program (CUDA vs the normal
compilers). Any programmed and CUDA and the IBM Cell chip that could comment
on how hard it is to do something useful? In any case, the reality and market
acceptance of this approach seem to be aggressively closing. Thus machines
with 16-32 threads/cores are becoming rather common (Sun T1000/T2000, quad
socket quad core Intel, and hopefully RSN 4-8 socket 4 core AMDs).
Seems like additional cores|threads are an excellent way to make use of tons
of memory bandwidth in a latency tolerant fashion to get reasonable real world
performance on applications that people actually care about (read that as
willing to pay for). All the while utilizing more commodity technology then
the vector machines of yesteryear.
Latency on the other hand (especially when measured in clock cycles) is a
wall, extremely hard to fix, and those nasty laws of physics keep getting in
I don't see any particular reason why memory bandwidth can go through a full
doublings in the near future if there was a market for it, last I checked
nvidia was doing pretty well ;-)
 Sorry to use marketing bandwidth, I've not seen stream numbers for CUDA
yet. I hope to work on one though. If anyone has numbers please speak
 The nvidia 8600/8800 are single precision AFAIK, no idea if the 9600GT
is one of the new generation DP capable chips.
More information about the Beowulf