[Beowulf] many cores and ib
Gilad Shainer
Shainer at mellanox.com
Mon May 5 15:32:10 PDT 2008
> >> Since we have some users that need
> >> shared memory but also we want to build a normal cluster for mpi
> >> apps, we think that this could be a solution. Let's say about
> >> 8 machines (96 processors) pus infiniband. Does it sound correct?
> >> I'm aware of the bottleneck that means having one ib interface for
> >> the mpi cores, is there any possibility of bonding?
>
> > Bonding (or multi-rail) does not make sense with "standard
> IB" in PCIe
> > x8 since the PCIe connection limits the transfer rate of a single
> > IB-Link already.
>
> PCIe x8 Gen2 provides additional bandwidth as Gilad said. On
> Opteron systems that is not available yet (and won't be for
> some time), so you may want to search for AMD-CPU or
> Intel-CPU based boards that have PCIe
> x16 slots.
>
One more useful info, is that there are couple of installation in Japan
where they use 4 "regular IB DDR" adapters in 4 PCIe x8 slots to provide
6GB/s (1500MB per slot) and they do bonding to have it as a single pipe.
If you plan to use Intel you can use PCIe Gen2 with IB QDR and get
3200MB per PCIe Gen2 slot.
> > My hint would be to go for Infinipath from QLogic or the
> new ConnectX
> from Mellanox since message rate is probably your limiting
> factor and those technologies have a huge advantage over
> standard Infiniband SDR/DDR.
>
> I agree that message rate may be your limiting factor.
> Results with QLogic (aka InfiniPath) DDR adapters:
>
> DDR Peak MPI Bandwidth Peak Message Rate
> Adapter (no message coalescing**)
> QLE7280 PCIe x16 1950 MB/s 20-26*
> Million/sec (8 ppn)
> QLE7240 PCIe x8 1500 MB/s 19
> Million/sec (8 ppn)
>
> Test details: All run on two nodes, each with 2x Intel Xeon
> 5410 (Harpertown, quad-core, 2.33 GHz CPUs), 8 cores per
> node, SLES 10.
> except,
> * 26 M messages/sec requires faster CPUs, 3 to 3.2 Ghz.
>
> 8 ppn means 8 MPI processes per node. The non-coalesced
> message rate performance of these adapters scales pretty
> linearly from 1 to 8 cores.
> That is not the case with all modern DDR adapters.
>
As Tom wrote, the message rate depends on the number of CPUs. With the
benchmark Tom indicated below and the same CPU, you can get up to 42M
msg/sec with ConnectX.
> Benchmark = OSU Multiple Bandwidth, Message Rate benchmark,
> osu_mbw_mr.c The above performace results can be had with
> either MVAPICH 1.0 or QLogic MPI 2.2 (other MPIs are in the
> same ballpark with these adapters).
>
> Note that MVAPICH 0.9.9 had meassage-coalescing on by
> default, and MVAPICH 1.0 has it off by default. There must
> be a reason.
As far as I know, the reason for that was to have the user pick his
choice. As OSU mentioned, there are some applications when this helps
and some that it does not.
Gilad.
>
> Revisiting:
> >
> > Bonding (or multi-rail) does not make sense with "standard
> IB" in PCIe
> > x8 since the PCIe connection limits the transfer rate of a single
> > IB-Link already.
>
> Some 4-socket motherboards have independent PCIe buses to x8
> or x16 slots. In this case, multi-rail does make sense. You
> can run the QLogic adapters as dual-rail without bonding. On
> MPI applications, half
> of the cores will use one adapter and half will use the
> other. Whether
> the more expensive dual-rail arrangement is necessary and/or
> cost-effective would be very application-specific.
>
> Regards,
> -Tom Elken
>
>
>
> >
>
> >
>
> > Infinipath and ConnectX are available as DDR Infiniband and
> provide a
>
> bandwidth of more than 1800 MB/s
>
>
>
> Good suggestion.
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org To change your
> subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
More information about the Beowulf
mailing list