[Beowulf] openmpi 2.2 standards and infiniband cards
Vincent Diepeveen
diep at xs4all.nl
Sun Apr 15 20:26:51 PDT 2012
hi,
I'm reading in open mpi 2.2 standards and my eye fell onto something
amazing.
http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
chapter 11 "one-sided communications"
page 339:
"it is erroneous to have concurrent conflicting accesses to the same
memory location in a window"
Does this mean that each update, either read or write in itself is
atomic with infiniband?
In computerchess it can happen we simply write and read to the same
locations.
This can result of course in garbled data. Most don't care, some like
me store a CRC and care even less.
Odds is relative small it happens, but it happens. About once each
200 billion operations
there is an atomic coincidence that 2 writes happen to the same
location i measured
(at Origin3800 @ 200 cpu's @ 120GB ram), resulting in garbage
written at that specific cacheline,
or 2 consecutive cachelines sharing 20 bytes of data (obviously
usually this last case happens - at
PC hardware actually only the last case can occur and entries garbled
within 1 cacheline).
Now the actual reads are a byte or 160, from which only 20 bytes will
get used,
so the statistical odds is a lot larger than this 1 in 200 billion
that it
occurs that overlapping parts of RAM get requested by 2 or more cores
at the same time, randomly somewhere
at the cluster and/or writes of 20 bytes that fall within that range.
What's actually happening in hardware here?
As it says further: "if a location is updated by a put or accumulate
operation, then this location cannot be
accessed by a load or another RMA operation until the updating
operation has completed."
Well it's gonna happen, not much, but sometimes.
Of course i don't care if there is some slowdown in that once in a
billion time that 2 or more cores write/read at
the same memory within the window, but i do care when normal
operations get slowed down by this spec
as given in MPI 2.2 :)
If remote cores ask/write RAM (which usually are different non
overlapping RMA requests from the RAM)
by put/get a random 20-160 bytes scathered through say a gigabyte of
RAM of the receiving node,
can the receiving node then issue those say half a dozen random
lookups/writes to the RAM buffer of a gigabyte
in a concurrent manner?
More information about the Beowulf
mailing list