[Beowulf] Re: Re: Home beowulf - NIC latencies

Vincent Diepeveen diep at xs4all.nl
Wed Feb 16 08:44:55 PST 2005

At 07:17 16-2-2005 -0500, Robert G. Brown wrote:
>On Wed, 16 Feb 2005, Patrick Geoffray wrote:
>> > This is however too much detail for this forum though, as most of the 
>> > postings here discuss much more practical issues :)
>> I am bored with cooling questions. However, it's quite time consuming to 
>> argue by email. I don't know how RGB can keep the distance :-)
>> Patrick
>I stuck a hairpin into an electrical socket at age 2 (an "enlightening"
>experience I must say) and had a large rock fall on my head from a
>height of almost a meter at age 8.
>Since then, I hardly ever get bored with cooling questions, because I
>cannot remember that they've been asked.  What were we talking about,
>Oh yeah, MPI and all that.
>I've actually been enjoying reading the discussion and not
>participating, since I'm a PVM kinda guy.  But SINCE my name was invoked
>in vain, I'll make a single comment on the code quality issue, which is
>that underlying the discussion of communication pattern, blocking vs
>non-blocking, and directives is the fundamental scaling properties of
>the code and algorithm itself.  So on the issue of whether MPI sucks
>because the application sucks -- well, possibly, but it seems more
>likely that the application sucks because its parallel scaling
>properties (with the algorithm chosen) suck.

It is possible for algorithms to have sequential properties, in short
making it hard to scale well. Game tree search happens to have a few of
such algorithms, from which one is performing superior with a number of
enhancements having the same property that a faster blocked get latency
speeds it up exponential.

For the basic idea why there is an exponential speedup see Knuth and search
for the algorithm 'alfabeta'.

So the assumption that an algorithm sucks because it doesn't need bandwidth
but latency is like sticking a hairpin in an electrical socket.

If users would JUST need a little bit of bandwidth they already can get
quite far with $40 cards. 

So optimizing MPI for low latency small messages IMHO is very relevant. 

We get many improvements in hardware coming years. dual core, cell
streaming type and obviously when software becomes available in larger
quantities to run parallel, many will try the jump to running at clusters too.

Obviously if you can make it easier from programmers viewpoint then to
parallellize their software, like implementing short messages in a kind of
single system image type of software, or even certain algorithms, makes
sense to me.

The step from shared memory programming to MPI is a rather huge step

Even if all you want is a byte which sometimes is at a remote machine and
usually at your local cache, but you NEED that byte for your software, just
to know whether it's a 1 or 0, then the last you want to be toying with is
writing special code. You don't care how the result gets there, just as
long as it gets there.

>As to how "intelligent" the back end library should be at choosing
>algorithm -- I would say the BASIC library should be atomic, elementary,
>NOT algorithm level stuff.  A thin skin on top of raw networking calls
>that provides the various things one always has to do oneself but not
>much more.  Where one gets into trouble is where one uses a command that
>has a complex structure that doesn't fit your code without realizing it,
>and the reason you don't realize it is because all that detail is
>hidden, and isn't even uniform in RELATIVE performance across varying
>network hardware.
>In other words, to make MPI do more, either make it do less (in the form
>of commands that can be used to build "more" in a manner that is tuned
>to application and hardware) or be prepared to REALLY make it SMART
>behind the scenes.
>This isn't just MPI, BTW.  PVM suffers from the same thing.  I honestly
>think that both are limited tools in part BECAUSE they put too thick a
>skin between the programmer and the network.  If you want real
>performance and complete control over communication algorithm, you
>probably have to use raw/low level networking commands, and write the
>appropriate "collective" operations for your particular application and
>Of course nobody does this -- not portable and a PITA to
>design/write/maintain.  Or perhaps a few people DO do this, but they're
>programming gods.  And this isn't crazy, really.
>    rgb
>Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
>Duke University Dept. of Physics, Box 90305
>Durham, N.C. 27708-0305
>Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list