[Beowulf] cluster softwares supporting parallel CFD computing
Eric W. Biederman
ebiederm at xmission.com
Fri Sep 8 20:25:16 PDT 2006
Greg Lindahl <greg.lindahl at qlogic.com> writes:
> On Thu, Sep 07, 2006 at 01:15:01PM -0600, Eric W. Biederman wrote:
>
>> I agree. Taking an interrupt per message is clearly a loss.
>
> Ah. So we're mostly in violent agreement!
That is always nice :)
>> Polling is a reasonable approach for the short durations say
>> <= 1 milisecond, but it is really weird to explain that you can tell a
>> MPI application has failed to receive a message because it's cpu
>> utilization goes up. Polling for seconds on end is a very rude thing
>> to do on a multitasking OS.
>
> This is very true. You'll find that many MPI implementations now get
> this right, for example I've seen OpenMPI has a policy where you can
> tell it to poll for a short time and then call yield(). Our MPI has
> this as the default. It's a compromise which doesn't hurt performance
> that often.
Nice. I guess I just haven't had a chance to see this in action yet.
>> The problem from what I can tell is that latency is fundamental, and mostly
>> an artifact of the card implementation. We are quickly reaching the
>> point we won't be able to improve latency any more.
>
> This is also very true. That's why we've moved on to attacking message
> rate and short-message bandwidth. Good message rate at high core counts
> is going to be even more important when we get 4 cores / socket.
Sounds right.
>> On the other hand it is my distinction impression the reason there is no
>> opportunity cost from polling is that the applications have not been
>> tuned as well as they could be. In all other domains of programming
>> synchronous receives are serious looked down upon. I don't know why
>> that should not apply to MPI codes as well.
>
> It does apply, however, many parallel algorithms used today are
> naturally blocking. Why? Well, complicating your algorithm to overlap
> communication and computation rarely gives a benefit in practice. So
> anyone who's tried has likely become discouraged, and most people
> haven't even tried.
Could be I have not managed to climb high enough up into the stack
to get a look at a lot of applications yet.
Eric
More information about the Beowulf
mailing list