[Beowulf] building Infiniband 4x cluster questions

Gilad Shainer Shainer at Mellanox.com
Mon Nov 7 17:46:38 PST 2011

> I just test things and go for the fastest. But if we do theoretic math, SHMEM
> is difficult to beat of course.
> Google for measurements with shmem, not many out there.

SHMEM within the node or between nodes?

> Fact that so few standardized/rewrote their floating point software to gpu's,
> is already saying enough about all the legacy codes in HPC world :)
> When some years ago i had a working 2 cluster node here with QM500- A , it
> had at 32 bits , 33Mhz pci long sleeve slots a blocked read latency of under 3
> us is what i saw on my screen. Sure i had no switch in between it. Direct
> connection between the 2 elan4's.
> I'm not sure what pci-x adds to it when clocked at 133Mhz, but it won't be a
> big diff with pci-e.

There is a big different between PCIX and PCIe. PCIe is half the latency - from 0.7 to 0.3 more or less.

> PCI-e  probably only has a bigger bandwidth isn't it?

Also bandwidth ...:-)

> Beating such hardware 2nd hand is difficult. $30 on ebay and i can install 4
> rails or so.
> Didn't find the cables yet though...
> So i don't see how to outdo that with old infiniband cards which are
> $130 and upwards for the connectx, say $150 soon, which would allow only
> single rail
>   or maybe at best 2 rails. So far didn't hear anyone yet who has more than
> single rail IB.
> Is it possible to install 2 rails with IB?

Yes, you can do dual rails

> So if i use your number in pessimistic manner, which means that there is
> some overhead of pci-x, then the connectx type IB, can do 1 million blocked
> reads per second theoretic with 2 rails. Which is $300 or so, cables not
> counted.

Are you referring to RDMA reads? 


More information about the Beowulf mailing list