[Beowulf] Poor bandwith from one compute node

Gus Correa gus at ldeo.columbia.edu
Thu Aug 17 11:40:22 PDT 2017

On 08/17/2017 12:35 PM, Joe Landman wrote:
> On 08/17/2017 12:00 PM, Faraz Hussain wrote:
>> I noticed an mpi job was taking 5X longer to run whenever it got the 
>> compute node lusytp104 . So I ran qperf and found the bandwidth 
>> between it and any other nodes was ~100MB/sec. This is much lower than 
>> ~1GB/sec between all the other nodes. Any tips on how to debug 
>> further? I haven't tried rebooting since it is currently running a 
>> single-node job.
>> [hussaif1 at lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
>> tcp_lat:
>>     latency  =  17.4 us
>> tcp_bw:
>>     bw  =  118 MB/sec
>> [hussaif1 at lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
>> tcp_lat:
>>     latency  =  20.4 us
>> tcp_bw:
>>     bw  =  1.07 GB/sec
>> This is separate issue from my previous post about a slow compute 
>> node. I am still investigating that per the helpful replies. Will post 
>> an update about that once I find the root cause!
> Sounds very much like it is running over gigabit ethernet vs 
> Infiniband.  Check to make sure it is using the right network ...

Hi Faraz

As others have said answering your previous posting about Infiniband:

- Check if the node is configured the same way as the other nodes,
in the case of Infinband, if the MTU is the same,
using connected or datagram mode, etc.


Besides, for Open MPI you can force it at runtime not to use tcp:
--mca btl ^tcp
or with the syntax in this FAQ:

If that node has an Infinband interface with a problem,
this should at least give a clue.


In addition, check the limits in the node.
That may be set by your resource manager,
or in /etc/security/limits.conf
or perhaps in the actual job script.
The memlock limit is key to Open MPI over Infiniband.
See FAQ 15, 16, 17 here:


Moreover, check if the mlx4_core.conf (assuming it is Mellanox HW)
is configured the same way across the nodes:


See FAQ 18 here:


To increase the btl diagnostic verbosity (that goes to STDERR, IRRC):

--mca btl_base_verbose 30

That may point out which interfaces are actually being used, etc.

See this FAQ:



Finally, as John has suggested before, you may want to
subscribe to the Open MPI mailing list,
and ask the question there as well:


There you will get feedback from the Open MPI developers +
user community, and that often includes insights from
Intel and Mellanox IB hardware experts.


I hope this helps.

Gus Correa

>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list