[Beowulf] Q: IB message rate & large core counts (per node)?
Patrick Geoffray
patrick at myri.com
Mon Mar 15 13:27:23 PDT 2010
Hi Richard,
I meant to reply earlier but got busy.
On 2/27/2010 11:17 PM, richard.walsh at comcast.net wrote:
> If anyone finds errors in it please let me know so that I can fix
> them.
You don't consider the protocol efficiency, and this is a major issue on
PCIe.
First of all, I would change the labels "Raw" and "Effective" to
"Signal" and "Raw". Then, I would add a third column "Effective" which
consider the protocol overhead. The protocol overhead is the amount of
raw bandwidth that is not used for useful payload. On PCIe, on the Read
side, the data comes in small packets with a 20 Bytes header (could be
24 with optional ECRC) for a 64, 128 or 256 Bytes payload. Most PCIe
chipsets only support 64 Bytes Read Completions MTU, and even the ones
that support larger sizes would still use a majority of 64 Bytes
completions because it maps well to the transaction size on the memory
bus (HT, QPI). With 64 Bytes Read Completions, the PCIe efficiency is
64/84 = 76%, so 32 Gb/s becomes 24 Gb/s, which correspond to the hero
number quoted by MVAPICH for example (3 GB/s unidirectional).
Bidirectional efficiency is a bit worse because PCIe Acks take some raw
bandwidth too. They are coalesced but the pipeline is not very deep, so
you end up with roughly 20+20 Gb/s bidirectional.
There is a similar protocol efficiency at the IB or Ethernet level, but
the MTU is large enough that it's much smaller compared to PCIe.
Now, all of this does not matter because Marketers will keep using
useless Signal rates. They will even have the balls to (try to) rewrite
history about packet rate benchmarks...
Patrick
More information about the Beowulf
mailing list