[Beowulf] Mutiple IB networks in one cluster
Jeff Becker
jeffrey.c.becker at nasa.gov
Mon Feb 3 09:39:06 PST 2014
Hi Prentice.
On 01/30/2014 08:33 AM, Prentice Bisbal wrote:
> Beowulfers,
>
> I was talking to a colleague the other day about cluster architecture
> and big data, and this colleague was thinking that it would be good to
> have two separate FDR IB clusters within a single cluster: one for
> message-passing, and the other purely for data movement. I'm a bit
> skeptical of this myself. I was always under the impression that IB
> has more than enough bandwidth for message-passing and I/O. I have
> some questions about this idea:
>
> 1. Does this make sense?
>
> 2. Does anyone have first hand experience with doing this, or can
> point me to someone who does (articles on line, papers on the topic
> will suffice)?
We use two fabrics on our Pleiades cluster at NASA. It is typically used
as you propose, message passing on one fabric, I/O (NFS, Lustre) on the
other. However, jobs can request both rails be used for message passing
- in this case, message passing traffic could contend with I/O.
>
> 3. Would the present any issues for managing the fabric? I know IB is
> designed to detect loops automatically, but what about making sure
> certain traffic stays on certain IB interfaces.
Each fabric is disjoint from the other, and has its own subnet (manager).
>
> 4. Since IB uses cross-bar switches (please correct me if I'm wrong),
> we wouldn't need to duplicate switchgear, just double IB connections
> on each host, correct?
>
If you have separate subnets, you probably need separate switches for
each fabric.
Hope this helps,
-jeff
More information about the Beowulf
mailing list