[Beowulf] MPI over RoCE?

Matt Wallis mattw at madmonks.org
Thu Feb 27 09:10:04 UTC 2025


From my experience, RoCE will be just as fast if not faster than IB, inside a single Ethernet switch, it’s when you go outside the switch you lose out.
The trick has been finding NICs that are supported natively by OFED. I tend to still find the Mellanox NICs the most reliable and well supported.

Then the question is if you’re still buying the Mellanox NICs, why not go the whole hog, particularly as you may grow outside of a single switch.

Matt.

> On 27 Feb 2025, at 19:19, Brice Goglin <brice.goglin at gmail.com> wrote:
> 
> Hello
> 
> While meeting vendors to buy our next cluster, we got different recommendations about the network for MPI. The cluster will likely be about 100 nodes. Some vendors claim RoCE is enough to get <2us latency and good bandwidth for such low numbers of nodes. Some others say RoCE is far behind IB for both latency and bandwidth and we likely need to get IB if we care about network performance.
> 
> If anybody tried MPI over RoCE over such a "small" cluster, what NICs and switches did you use?
> 
> Also, is the configuration easy from the admin (installation) and users (MPI options) points of view?
> 
> Thanks
> 
> Brice
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> 


More information about the Beowulf mailing list