[Beowulf] IB in the real world
Bill Broadley
bill at cse.ucdavis.edu
Thu May 12 14:32:58 PDT 2005
I've been looking at the high performance interconnect options.
As one might expect every vendor sells certain strengths and accuses
the competition of certain weaknesses. I can't think of a better place
to discuss these things. The Beowulf list seems mostly vendor neutral,
er at least peer reviewed, and hopefully some end users actually using
the technology can provide some real world/end user perspectives.
So questions that come to my mind (but please feel free to add more):
1. How good is the OpenIB mapper? It periodically generates static
routing tables maps of available IB nodes? It's the critical piece
for handling adding a node or removing a node and keeping a cluster
functioning? Reliable?
2. How good is the OpenIB+MPI stack(s)? Any reliable enough for large
month long jobs? Which? I've heard rumors of large IB clusters that
never met the acceptance criteria. FUD or real? Related to IB
reliability or performance?
3. How good are the mappers that run inside various managed switches?
Reliable? Same code base? Better or worse than the OpenIB mapper?
4. IB requires pinned memory per node that increases with the total
node count, true? In all cases? Exactly what is the formula for memory
overhead? It is per node? IB card? Per CPU? Is the pinned memory
optional? What are the performance implications of not having it?
5. Routing is static? Is there flow control? Any handling of hot spots?
How are trunked lines load balanced (i.e. 6 IB ports used as an uplink
for a 24 port switch). Load balancing across uplinks? Arbitrary
topology (rings? tree only? mesh?) Static mapping between downlinks
and uplinks (no load balancing)? Cut through or store and forward?
Both? When? Backpressure?
6. What real world latencies and bandwidths are you observing on production
clusters with MPI? How much does that change when all nodes are running
the latency or bandwidth benchmark?
7. Using the top500 numbers to measure efficiency what would be a
a good measure of interconnect efficiency? Specifically RMax/RPeak
for a given similar size cluster?
8. Are there more current HPC Challenge numbers than
http://icl.cs.utk.edu/hpcc/hpcc_results.cgi? Are these benchmark
results included in all top500 submissions? It seems like a good place
to measure latency/bandwidth and any relation to cluster size.
9. Most (all?) IB switches have Mellanox 24*4x chips in them? What is
the actual switch capacity of the chip? 20GBit*24? Assuming a
particular clock speed? Do switches run that clock speed? 4x SDR
per link? DDR?
I'd be happy to summarize responses, or just track the discussion on
the list. I'm of course interested in similar for quadrics, Myrinet, and
any other competitors in the Beowulf interconnect space. Although maybe
that should be delayed for a week each. Does anyone know of a better
place to ask such things and get a vendor neutral response (or at least
responses that are subject to peer review)?
Material sent to me directly, NOT covered by NDA, can be included in
my summary anonymously by request.
--
Bill Broadley
Computational Science and Engineering
UC Davis
More information about the Beowulf
mailing list