[Beowulf] RoCE vs. InfiniBand
Jörg Saßmannshausen
sassy-work at sassy.formativ.net
Fri Nov 27 21:45:15 UTC 2020
Hi Chris,
that is basically what we are planning to do: using RoCE v2 for the Lnet
routers and use InfiniBand for the HPC part of the cluster. As I have
mentioned before, the problem is that the existing Lustre will be in a
different location to where the new facility will be. At least that was the
latest we got told. To make things a bit more interesting: currently we *are*
using InfiniBand (Mellanox and the Intel one) to connect from the compute
nodes to Lustre.
So we have 2 problems: how to connect the new with the old world and what to
do with the HPC workload we are having? This is where that mixed RoCE/
InfiniBand design came up.
I hope this, with what I wrote in the other replies, makes sense.
All the best from London, still cold and dark. :-)
Jörg
Am Freitag, 27. November 2020, 07:13:46 GMT schrieb Chris Samuel:
> On Thursday, 26 November 2020 3:14:05 AM PST Jörg Saßmannshausen wrote:
> > Now, traditionally I would say that we are going for InfiniBand. However,
> > for reasons I don't want to go into right now, our existing file storage
> > (Lustre) will be in a different location. Thus, we decided to go for RoCE
> > for the file storage and InfiniBand for the HPC applications.
>
> I think John hinted at this, but is there a reason for not going for IB for
> the cluster and then using Lnet routers to connect out to the Lustre storage
> via ethernet (with RoCE) ?
>
> https://wiki.lustre.org/LNet_Router_Config_Guide
>
> We use Lnet routers on our Cray system to bridge between the Aries
> interconnect inside the XC to the IB fabric our Lustre storage sits on.
>
> All the best,
> Chris
More information about the Beowulf
mailing list