[Beowulf] Large amounts of data to store and process

Fri Mar 15 18:19:14 PDT 2019

> On 2019, Mar 15, at 4:22 PM, Douglas Eadline <deadline at eadline.org> wrote:
> 
> 
>> We are definitely going that way, but for every day desktops, MPI is not
>> the way to go. Since most desktops are stand-alone islands,
>> multi-threading makes more sense, since it has less overhead compared to
>> MPI, and most desktop apps don't need the inter-node communications
>> provided by MPI.
> 

I spent quite a lot of time in recent months looking at communications in single note parallel applications.  These are often available in both OpenMP and MPI versions.

I cannot recall a single example of an OpenMP version that was actually faster than the MPI version.

This is the sort of thing that is bound to cause arguments!  Of course there is plenty to argue about.

* OpenMP versions have a tendency to use less RAM because it is all shared.  MPI versions have to duplicate read-only datasets and there are two or three copies of messages floating around depending on the MPI runtime.

* OpenMP versions are probably an easier way to get started with parallelism.  Just slap on some #pragmas and you get some benefit

* MPI versions can run multinode or same node with equal ease.

* Some punters argue that MPI memory use scales badly with huge numbers of ranks, so a hybrid approach is best, with OpenMP on node and MPI between nodes.  I am not convinced. You get the complexities of both.

* The performance differences are not huge, in most cases.

* The OpenMP runtime is hardly free.  There is a <lot> of locking and copying and broadcasting and reducing and notifying going on down there.

* There is no particular reason to think that MPI copies are slower than coherent shared memory.  Fetching a value from another core is typically slower than L3 and only slightly faster than DRAM.  Even when MPI is two-copies via a shared memory page, one of them is likely to be local-cache and really not cost that much.

In the above I am comparing OpenMP and MPI as routes to parallelism.  It is easy to say “multithreading” but for most programmers it is a real tarpit.  You get a lot of programs that run on on the development machine and lock up elsewhere.  There are a few smart people who understand this stuff, but it is hard.  At least MPI and OpenMP and things like Thread Building Blocks and CILK can abstract the situation to a degree.  I wouldn’t advise anyone to roll their own multithreaded world. And use MCS locks for goodness sake!

I was left after all this thinking that OpenMP was a fine thing for folks getting started with parallelism, but MPI was probably a better bet.  There is also now 25 years of experience suggesting that you don’t have to be a wizard to get an MPi code to work.

-Larry (dons nomex suit)