[Beowulf] InfiniBand channel bundling?

Prentice Bisbal prentice.bisbal at rutgers.edu
Thu Oct 30 07:16:42 PDT 2014

On 10/29/2014 06:43 PM, Jörg Saßmannshausen wrote:
> Hi all,
> thanks again for the wealth of information.
> Now, given that I am not interested in transporting files over the IB network
> but I am doing parallel calculations, I would have thought that the latency
> here is more important than the speed?
> Thus, if FDR has a higher latency than QDR, does that mean my performance is
> decreasing when I am running a calculation between nodes?

It depends on the size of the messages sent back and forth during the 
calculations, and the frequency of communications: is there 
communications every time step, every x time steps, etc. Latency affects 
the communications time for all messages, it's just more noticeable for 
small messages since it represents a larger percentage of the total 
communication time.

For example, if your doing some kind of particle physics code, where 
each node gets a volume of space, at the end of each time step, each 
node needs to share the updated information about the particles along 
it's borders with the it's neighbor nodes on the corresponding borders. 
This is known as a 'halo' exchange'. How much data a halo exchange 
requires depends on the problem and how finely it's decomposed across 
the compute nodes, but I'm sure it can be enough data where the higher 
bandwidth of FDR is beneficial.

If your application has a lot of barriers, but little data exchanges 
between nodes, latency would be more important, since the size of 
barrier messages are very small.

I'm not a big fan of the cliche response 'it depends', but it's cliche 
because it does apply to many questions on this list. If FDR is hurting 
the performance of your apps, it really depends on the specifics of your 

> For those of you who are into Chemistry code: I am using VASP, cp2k, quantum
> espresso and cpmd mainly. All of that is plain wave code.
I'm not familiar enough with the nitty-gritty of any of these codes to 
comment on their behavior.
> All the best from a wet London
> Jörg
> On Mittwoch 29 Oktober 2014 Prentice Bisbal wrote:
>> On 10/28/2014 04:43 PM, Mark Hahn wrote:
>>> On Tue, 28 Oct 2014, John Hearns wrote:
>>>> Here is a very good post from Glenn Lockwood regarding FDR versus
>>>> dual-rail QDR:
>>>> http://glennklockwood.blogspot.co.uk/2013/05/fdr-infiniband-vs-dual-rail
>>>> -qdr.html
>>> indeed, very nice.  though also quite surprising - is it known that
>>> FDR is so terrible for latency?  seems astonishing to me.
>> Yes, it was known to me. I had already known that FDR was worse than QDR
>> for latency, but I don't remember my source. I don't know if I'd
>> characterize it as "so terrible", though.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Prentice Bisbal
Manager of Information Technology
Rutgers Discovery Informatics Institute (RDI2)
Rutgers University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20141030/37d1f904/attachment.html>

More information about the Beowulf mailing list