[Beowulf] Opinions of Hyper-threading?
Jim Lux
james.p.lux at jpl.nasa.gov
Thu Feb 28 06:50:27 PST 2008
Quoting Joe Landman <landman at scalableinformatics.com>, on Thu 28 Feb
2008 05:20:01 AM PST:
> Bill Broadley wrote:
>>> The problem with many (cores|threads) is that memory bandwidth
>>> wall. A fixed size (B) pipe to memory, with N requesters on that
>>> pipe ...
>>
>> What wall? Bandwidth is easy, it just costs money, and not much at
>> that. Want 50GB/sec[1] buy a $170 video card. Want 100GB/sec...
>> buy a
>
> Heh... if it were that easy, we would spend extra on more bandwidth for
> Harpertown and Barcelona ...
>
> The point is that the design determines your hard/fixed per socket
> limits, and no programming technique is going to get you around that
> limit per socket. You need to change your programming technique to go
> many socket. That limit is the bandwidth wall.
>
And this is much the same as the earlier discussions on this list,
when folks were building 8 and 16 processor clusters. There, the
bandwidth wall was the 10Mbps Ethernet interconnect, first through a
hub, then a switch, etc.
This is sort of why any programming technique for speed up that relies
on tight coupling (e.g. shared memory) can't scale infinitely. At
some point, the speed of light and physical size conspire to do you in.
If one wanted to design revolutionary distributed/parallel computing
algorithms, one could probably work with floppy disks and sneakernet.
If it works there, it will certainly work on any faster mechanism.
See.. true computer science doesn't need a 1000 processor cluster.
Another cluster related computer science issue is to start dealing
with unreliable links between the nodes of the cluster. The
overwhelming majority of cluster codes assume that message passing is
perfect and has no errors. Sometimes this is provided transparently
by the communications mechanism (i.e. TCP/IP promises in order,
error-free delivery). However, in the TCP case that comes at a cost..
the latency isn't constant (because it achieves reliability by
temporal redundancy:retries), and if your algorithm does some sort of
scatter/gather and needs barrier synchronization, a late packet on one
link brings the whole mass to a halt.
As data rates get higher, even really good bit error rates on the wire
get to be too big. Consider this.. a BER of 1E-10 is quite good, but
if you're pumping 10Gb/s over the wire, that's an error every second.
(A BER of 1E-10 is a typical rate for something like 100Mbps link...).
So, practical systems use some sort of FEC, but even with that, BERs
of 1E-14 or 1E-15 are pretty much state of the art over shortish
(meters) distances. (It's a power/signal to noise ratio thing..How
much energy can you put into sending one bit of information?)
Jim
More information about the Beowulf
mailing list