[Beowulf] multi-threading vs. MPI
Joe Landman
landman at scalableinformatics.com
Mon Dec 10 16:40:15 PST 2007
Eray Ozkural wrote:
> On Dec 9, 2007 9:37 PM, Toon Knapen <toon.knapen at gmail.com> wrote:
>
>> And considering that future processors are even going more extreme in
>> the Numa direction (e.g. the Intel 80-core), is'nt it more future-safe
>> to go with MPI if one would start a large coding-project now?
>>
>> thanks for all the reactions,
>
> I think that's a good point. For NUMA obviously MPI is more useful.
I have been staying out of the debate thus far, as I believe that it is
more likely to generate heat than light.
A few obvious points:
a) single benchmarks do not a definitive statement make
b) the only code that matters is your code (really, this should be
everyone's mantra with benchmarking in general).
12 years ago, after starting work at SGI, I had to work hard to convince
people that a 75 MHz R8000 chip could actually be faster (e.g. lower
wall clock time on real app with real data) than a 233 MHz (or whatever
it was) Alpha chip. It was "obvious" to most people that Alpha was
faster. That was, it was obvious from the cpu clock, various "standard"
benchmark cases, and so forth ... until they ran their own codes, and
saw some rather different results.
The point of this is that I see the same thing playing out here, with
people's opinions and notes generating the heat. I would prefer to try
to shed a little light if possible, and keep the heat level as low as
possible.
FWIW: I have been using OpenMP for something like 11 years (pretty much
since inception), and MPI since about 1997. I have used both in
projects with customers, end users, collaborators. I have taught
graduate level courses in HPC programming using both.
Generally speaking, I find scientists/engineers generally "get" OpenMP
more easily than MPI. They have to work less hard to get some benefit
from OpenMP than MPI.
This above statement I expect to generate great deals of heat, which is
a shame, as the next statement should generate a great deal of light.
This said, since OpenMP does stuff for you, you have to think and work
harder to prevent the performance killing conditions which can and often
do show up in real code. OpenMP lets you share data, and as you
increase the number of CPUs sharing the data, on average the shared data
is often the bottleneck. Then again, with some careful re-crafting of
the code ... not a complete rewrite, it is entirely possible to mitigate
many of the issues. That is OpenMP saves you from thinking hard to get
some benefit, but you need to think hard to get good benefit for larger
systems. More about this in a minute.
MPI is harder (though some may disagree). You have to rewrite and
rethink your code. While this is harder, this is also a good thing. It
forces you to explicitly consider data locality issues (NUMA is an
example of a data locality hierarchy) which OpenMP does not explicitly
force you to consider. It forces you to avoid global data, and all the
pain that goes with it (false sharing, atomic updates, ...). It forces
you to explicitly move data.
Also, unlike OpenMP, the communication model can be easily matched to
the underlying problem. Which tends to mean a tighter coupling of the
computing resource to the algorithm. OpenMP is a bag-o-threads, and you
don't have an "explicit" communication pattern between threads.
I don't consider one "better" than the other for all problems. For
certain classes of problem, OpenMP is the logical and obvious choice,
while MPI is the logical and obvious choice for other classes. Aside
from this, without channeling an ex-US president, we need to define what
"better" means. Faster execution on model problems? Faster
benchmarking? Faster development, ease of code
testing/debugging/management?
I do agree with Greg in that I have not to date seen a code where the
hybrid model is better than the pure model.
Back to Eray's point.
For NUMA, you have a small set of data points which show that MPI
provides superior performance on a code. The question is whether or not
the OpenMP code used first-touch or similar allocation ... without more
information, it is fairly hard to draw conclusions, never mind general
conclusions. Large SGI machines have gobs of NUMA shared memory, and
you can get very good scalability with (non-trivial) OpenMP codes.
What we see going forward are desktops with 4-16 cores (biased as this
is what we are doing/selling) and a shared memory system. NUMA for AMD,
flat (non-NUMA) for Intel. Intel is going to NUMA as far as I have seen
at SC07 and elsewhere (and Intel folks, please do step in and let me
know if I am wrong). A well written OpenMP code, that knows how to use
memory correctly, should be able to exploit these multiple memory buses
without too many issues. The streams code is an example of a "trivial"
(sorry John) code which operates in OpenMP very nicely.
There are others. A fair number of commercial codes with large solvers
don't do decomposition very well, and tend not to have great MPI
versions, or not so great MPI scalability. They do shared memory quite
nicely, and will scale well on large processor count machines with lots
of memory buses (MSC/NASTRAN, various other similar codes, ...).
What I am much belaboring here is that it is *not* obvious at all that
one or the other method is "better" in a general sense (due to the fact
that "better" is not well defined to begin with in this context).
Our view has always been use what you are comfortable with, and what you
need. If you need to run across a cluster, use MPI. If you need to run
across a single large memory machine, use OpenMP.
FWIW: I would suggest learning both. With the advent of many-core
workstations, and accelerator systems with many many cores, programming
these things is more likely to be mediated by a compiler (OpenMP like)
than putting MPI stacks on the Cell SPUs (not enough local scratchpad
ram for it).
Just my $0.02, and I hope I generated light, and very little heat.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list