[Beowulf] openmpi 2.2 standards and infiniband cards

Vincent Diepeveen diep at xs4all.nl
Sun Apr 15 20:26:51 PDT 2012


hi,

I'm reading in open mpi 2.2  standards and my eye fell onto something  
amazing.

http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf

chapter 11 "one-sided communications"
page 339:

"it is erroneous to have concurrent conflicting accesses to the same  
memory location in a window"

Does this mean that each update, either read or write in itself is  
atomic with infiniband?

In computerchess it can happen we simply write and read to the same  
locations.
This can result of course in garbled data. Most don't care, some like  
me store a CRC and care even less.
Odds is relative small it happens, but it happens. About once each  
200 billion operations
there is an atomic coincidence that 2 writes happen to the same  
location i measured
  (at Origin3800 @ 200 cpu's @ 120GB ram), resulting in garbage  
written at that specific cacheline,
or 2 consecutive cachelines sharing 20 bytes of data (obviously  
usually this last case happens - at
PC hardware actually only the last case can occur and entries garbled  
within 1 cacheline).

Now the actual reads are a byte or 160, from which only 20 bytes will  
get used,
so the statistical odds is a lot larger than this 1 in 200 billion  
that it
occurs that overlapping parts of RAM get requested by 2 or more cores  
at the same time, randomly somewhere
at the cluster and/or writes of 20 bytes that fall within that range.

What's actually happening in hardware here?

As it says further: "if a location is updated by a put or accumulate  
operation, then this location cannot be
accessed by a load or another RMA operation until the updating  
operation has completed."

Well it's gonna happen, not much, but sometimes.

Of course i don't care if there is some slowdown in that once in a  
billion time that 2 or more cores write/read at
the same memory within the window, but i do care when normal  
operations get slowed down by this spec
as given in MPI 2.2 :)

If remote cores ask/write RAM (which usually are different non  
overlapping RMA requests from the RAM)
by put/get a random 20-160 bytes scathered through say a gigabyte of  
RAM of the receiving node,
can the receiving node then issue those say half a dozen random  
lookups/writes to the RAM buffer of a gigabyte
in a concurrent manner?









More information about the Beowulf mailing list