Beowulf & Fluid Mechanics

Sat Jul 15 16:27:11 PDT 2000

> To me, the most interesting conclusions based on Brian's tests concern
> MPI implementationa.

That's very interesting. The ch_p4 device of mpich has some problems. I
suspect it would do much better on this test if it did things like call
select() to find out the next time it can get more bytes sent on a socket
instead of repeatedly calling non-blocking write(). Obviously entering the
kernel zillions of times is a bad idea.

However, the MVICH result is more odd. MVICH doesn't have to go through the
kernel to talk to Giganet, right? And as a point of comparison, Myrinet on a
dual Alpha has an smp penalty of as little as 3% on some of my real
applications (the FSL weather codes). Myrinet and Giganet are pretty much
the same, from the programmer point of view.

Obviously I should get a copy of Brian's test before I make the bald-faced
claim that I'm about to make, but: perhaps the SMP penalty you're seeing
from MVICH comes from the fact that it's beating on main memory or the PCI
bus in a tight loop. If it instead did a little in-processor busy loop to
not poll more than once every few microseconds, the main memory or PCI
traffic would be significantly lessened, but the latency wouldn't change
much.

The Alpha has both a better main memory and PCI implementation, so that's my
guess why the effect is smaller. But, as I said, I would like to test this
and see.

SMP effects are an extremely interesting swamp to dive into. I know that
Compaq SC benchmarks (4-proc Alphas with Quadrics -- but only 2.5 cpus worth
of main memory bandwidth) can show some *really* interesting performance
losses for multiple CPUs. I have tried to avoid multi-processor machines for
this reason, but the 2nd Intel cpu is so cheap that it's hard to dodge using
them.

-- greg