[Beowulf] RoCE vs. InfiniBand

Jörg Saßmannshausen sassy-work at sassy.formativ.net
Fri Nov 27 21:45:15 UTC 2020


Hi Chris,

that is basically what we are planning to do: using RoCE v2 for the Lnet 
routers and use InfiniBand for the HPC part of the cluster. As I have 
mentioned before, the problem is that the existing Lustre will be in a 
different location to where the new facility will be. At least that was the 
latest we got told. To make things a bit more interesting: currently we *are* 
using InfiniBand (Mellanox and the Intel one) to connect from the compute 
nodes to Lustre. 
So we have 2 problems: how to connect the new with the old world and what to 
do with the HPC workload we are having? This is where that mixed RoCE/
InfiniBand design came up. 

I hope this, with what I wrote in the other replies, makes sense.

All the best from London, still cold and dark. :-) 

Jörg

Am Freitag, 27. November 2020, 07:13:46 GMT schrieb Chris Samuel:
> On Thursday, 26 November 2020 3:14:05 AM PST Jörg Saßmannshausen wrote:
> > Now, traditionally I would say that we are going for InfiniBand. However,
> > for reasons I don't want to go into right now, our existing file storage
> > (Lustre) will be in a different location. Thus, we decided to go for RoCE
> > for the file storage and InfiniBand for the HPC applications.
> 
> I think John hinted at this, but is there a reason for not going for IB for
> the cluster and then using Lnet routers to connect out to the Lustre storage
> via ethernet (with RoCE) ?
> 
> https://wiki.lustre.org/LNet_Router_Config_Guide
> 
> We use Lnet routers on our Cray system to bridge between the Aries
> interconnect inside the XC to the IB fabric our Lustre storage sits on.
> 
> All the best,
> Chris





More information about the Beowulf mailing list