[Beowulf] Correct networking solution for 16-core nodes

Thu Aug 3 14:02:16 PDT 2006

>> From the numbers published by Pathscale, it seems that the simple MPI

>> latency of Infinipath is about the same whether you go via PCIe or
HTX.
>> The application perfomance might be different, though.
>
> No, our published number is 1.29 usec for HTX and 1.6-2.0 usec for PCI

> Express. It's the message rate that's about the same.
>
> BTW there are more HTX motherboards appearing: the 3 IBM rack-mount
Opteron 
> servers announced this Tuesday all have HTX slots:
>
> http://www-03.ibm.com/systems/x/announcements.html
>
> In most HTX motherboards, a riser is used to bring out either HTX or
PCI
> Express, so you don't have to sacrifice >>anything. That's why IBM can

> put HTX in _all_ of their boxes even if most won't need it, because it

> doesn't take >anything away except a little board space. The existing 
> SuperMicro boards work like this, too.
>
> Vincent wrote:
>
>> Only quadrics is clear about its switch latency (probably competitors

>> have a worse one). It's 50 us for 1 card.
>
> We have clearly stated that the Mellanox switch is around 200 usec per

> hop.  Myricom's number is also well known.
>
>Mark Hahn wrote:
>
>> I intuit (totally without rigor!) that fatter nodes do increase 
>> bandwidth needs, but don't necessarily change the latency picture.
>
> Fatter nodes mean more cpus are simultaneously trying to send out
messages, 
> so yes, there is an effect, but it's not >quite latency: it's that
message 
> rate thing that I keep on talking about.
>
>
http://www.pathscale.com/performance/InfiniPath/mpi_multibw/mpi_multibw.
html
>
> Poor scaling as nodes get faster are the dirty little secret of our 
> community; our standard microbenchmarks don't >explore this, but
today's 
> typical nodes have 4 or more cores.
>

There was a nice debate on message rate, how important is this factor
when you
Want to make a decision, what are the real application needs, and if
this is 
just a marketing propaganda. For sure, the message rate numbers that are
listed 
on Greg web site regarding other interconnects are wrong. 

I would take a look on the new cluster in Tokyo institute of technology.
The 
Servers there are "fat nodes" too.  

Gilad.