[Beowulf] Correct networking solution for 16-core nodes
Gilad Shainer
Shainer at mellanox.com
Thu Aug 3 14:02:16 PDT 2006
>> From the numbers published by Pathscale, it seems that the simple MPI
>> latency of Infinipath is about the same whether you go via PCIe or
HTX.
>> The application perfomance might be different, though.
>
> No, our published number is 1.29 usec for HTX and 1.6-2.0 usec for PCI
> Express. It's the message rate that's about the same.
>
> BTW there are more HTX motherboards appearing: the 3 IBM rack-mount
Opteron
> servers announced this Tuesday all have HTX slots:
>
> http://www-03.ibm.com/systems/x/announcements.html
>
> In most HTX motherboards, a riser is used to bring out either HTX or
PCI
> Express, so you don't have to sacrifice >>anything. That's why IBM can
> put HTX in _all_ of their boxes even if most won't need it, because it
> doesn't take >anything away except a little board space. The existing
> SuperMicro boards work like this, too.
>
> Vincent wrote:
>
>> Only quadrics is clear about its switch latency (probably competitors
>> have a worse one). It's 50 us for 1 card.
>
> We have clearly stated that the Mellanox switch is around 200 usec per
> hop. Myricom's number is also well known.
>
>Mark Hahn wrote:
>
>> I intuit (totally without rigor!) that fatter nodes do increase
>> bandwidth needs, but don't necessarily change the latency picture.
>
> Fatter nodes mean more cpus are simultaneously trying to send out
messages,
> so yes, there is an effect, but it's not >quite latency: it's that
message
> rate thing that I keep on talking about.
>
>
http://www.pathscale.com/performance/InfiniPath/mpi_multibw/mpi_multibw.
html
>
> Poor scaling as nodes get faster are the dirty little secret of our
> community; our standard microbenchmarks don't >explore this, but
today's
> typical nodes have 4 or more cores.
>
There was a nice debate on message rate, how important is this factor
when you
Want to make a decision, what are the real application needs, and if
this is
just a marketing propaganda. For sure, the message rate numbers that are
listed
on Greg web site regarding other interconnects are wrong.
I would take a look on the new cluster in Tokyo institute of technology.
The
Servers there are "fat nodes" too.
Gilad.
More information about the Beowulf
mailing list