[Beowulf] Opinions of Hyper-threading?

Bill Broadley bill at cse.ucdavis.edu
Wed Feb 27 23:30:35 PST 2008

> The problem with many (cores|threads) is that memory bandwidth wall.  A 
> fixed size (B) pipe to memory, with N requesters on that pipe ...

What wall?  Bandwidth is easy, it just costs money, and not much at that. 
Want 50GB/sec[1] buy a $170 video card.  Want 100GB/sec... buy a better video 
card.  Want 200GB/sec buy 2.  Sure they don't have much memory (512-768MB) and 
of course no double (although I'm not sure if the now shipping 9600GT fixed 
that).  Sure video cards have minimal memory (512-768MB), no double precision 
on the normal cards [2], and are harder to program (CUDA vs the normal 
compilers).  Any programmed and CUDA and the IBM Cell chip that could comment 
on how hard it is to do something useful?  In any case, the reality and market 
acceptance of this approach seem to be aggressively closing.  Thus machines
with 16-32 threads/cores are becoming rather common (Sun T1000/T2000, quad
socket quad core Intel, and hopefully RSN 4-8 socket 4 core AMDs).

Seems like additional cores|threads are an excellent way to make use of tons 
of memory bandwidth in a latency tolerant fashion to get reasonable real world 
performance on applications that people actually care about (read that as 
willing to pay for).  All the while utilizing more commodity technology then 
the vector machines of yesteryear.

Latency on the other hand (especially when measured in clock cycles) is a 
wall, extremely hard to fix, and those nasty laws of physics keep getting in 
the way.

I don't see any particular reason why memory bandwidth can go through a full 
doublings in the near future if there was a market for it, last I checked 
nvidia was doing pretty well ;-)

[1] Sorry to use marketing bandwidth, I've not seen stream numbers for CUDA
     yet.  I hope to work on one though.  If anyone has numbers please speak
[2] The nvidia 8600/8800 are single precision AFAIK, no idea if the 9600GT
     is one of the new generation DP capable chips.

More information about the Beowulf mailing list