[Beowulf] onload vs offload

Mark Hahn hahn at mcmaster.ca
Thu Sep 22 12:35:05 PDT 2016

I was reviewing some rather fetid marketing collateral
about this topic, and finding mostly stuff from 2010ish.
A lot has changed since then: onboard PCIe, CPU speed, 
inter-socket bus, NUMA sensitivity of the kernel, lots
more cores, mem BW, presumably smarter applications, etc.

Does anyone have comments on recent generations of onload
vs offload interconnect performance?  Please don't respond 
unless it's recent and fully quantified (HW config, how 
measured, etc).

I'd also be interested to hear from MPI/app people about how useful 
offload really is (how often can real apps leverage RDMA ops, 
or the simple sorts of collectives that are offloadable?)

As keeper of probably the oldest living Quadrics system, I appreciate
the appeal of offload.  OTOH, there's no question that onloading puts
a lot of performance potential into the CPU-designer's hands...

thanks, mark hahn.

