[Beowulf] cluster softwares supporting parallel CFD computing

Fri Sep 8 20:25:16 PDT 2006

Greg Lindahl <greg.lindahl at qlogic.com> writes:

> On Thu, Sep 07, 2006 at 01:15:01PM -0600, Eric W. Biederman wrote:
>
>> I agree.  Taking an interrupt per message is clearly a loss.
>
> Ah. So we're mostly in violent agreement!

That is always nice :)

>> Polling is a reasonable approach for the short durations say 
>> <= 1 milisecond, but it is really weird to explain that you can tell a
>> MPI application has failed to receive a message because it's cpu
>> utilization goes up.  Polling for seconds on end is a very rude thing
>> to do on a multitasking OS.
>
> This is very true. You'll find that many MPI implementations now get
> this right, for example I've seen OpenMPI has a policy where you can
> tell it to poll for a short time and then call yield(). Our MPI has
> this as the default. It's a compromise which doesn't hurt performance
> that often.

Nice.  I guess I just haven't had a chance to see this in action yet.

>> The problem from what I can tell is that latency is fundamental, and mostly
>> an artifact of the card implementation.  We are quickly reaching the
>> point we won't be able to improve latency any more.
>
> This is also very true. That's why we've moved on to attacking message
> rate and short-message bandwidth. Good message rate at high core counts
> is going to be even more important when we get 4 cores / socket.

Sounds right.

>> On the other hand it is my distinction impression the reason there is no
>> opportunity cost from polling is that the applications have not been
>> tuned as well as they could be.  In all other domains of programming
>> synchronous receives are serious looked down upon.  I don't know why
>> that should not apply to MPI codes as well.
>
> It does apply, however, many parallel algorithms used today are
> naturally blocking. Why?  Well, complicating your algorithm to overlap
> communication and computation rarely gives a benefit in practice. So
> anyone who's tried has likely become discouraged, and most people
> haven't even tried.

Could be I have not managed to climb high enough up into the stack
to get a look at a lot of applications yet.

Eric