> * Some punters argue that MPI memory use scales badly with huge numbers of
> ranks, so a hybrid approach is best, with OpenMP on node and MPI between
> nodes.  I am not convinced. You get the complexities of both.

I think the thing there is "it depends" - for instance on BlueGene/Q where you 
had 16 cores and 16 GB RAM you could run 16 ranks of an MPI application per 
node but only have 1GB RAM per rank, or a single rank per node with 16GB RAM 
(or some power of 2 in between).   So for some large molecular dynamics 
simulations (like NAMD) going hybrid could be the difference between failing 
due to not enough memory (usually on rank 0) and being able to run to 

Now that's not necessarily the case any more (especially as BlueGene has gone 
the way of the dodo) but it was pretty important where I used to be!

