[Beowulf] Poor bandwith from one compute node

Thu Aug 17 11:02:19 PDT 2017

I would agree that the bandwidth points at 1 GigE in this case.

For IB/OPA cards running slower than expected, I would recommend ensuring
that they are using the correct amount of PCIe lanes.

On Thu, Aug 17, 2017 at 12:35 PM, Joe Landman <joe.landman at gmail.com> wrote:

>
>
> On 08/17/2017 12:00 PM, Faraz Hussain wrote:
>
>> I noticed an mpi job was taking 5X longer to run whenever it got the
>> compute node lusytp104 . So I ran qperf and found the bandwidth between it
>> and any other nodes was ~100MB/sec. This is much lower than ~1GB/sec
>> between all the other nodes. Any tips on how to debug further? I haven't
>> tried rebooting since it is currently running a single-node job.
>>
>> [hussaif1 at lusytp114 ~]$ qperf lusytp104 tcp_lat tcp_bw
>> tcp_lat:
>>     latency  =  17.4 us
>> tcp_bw:
>>     bw  =  118 MB/sec
>> [hussaif1 at lusytp114 ~]$ qperf lusytp113 tcp_lat tcp_bw
>> tcp_lat:
>>     latency  =  20.4 us
>> tcp_bw:
>>     bw  =  1.07 GB/sec
>>
>> This is separate issue from my previous post about a slow compute node. I
>> am still investigating that per the helpful replies. Will post an update
>> about that once I find the root cause!
>>
>
> Sounds very much like it is running over gigabit ethernet vs Infiniband.
> Check to make sure it is using the right network ...
>
>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>
> --
> Joe Landman
> e: joe.landman at gmail.com
> t: @hpcjoe
> w: https://scalability.org
> g: https://github.com/joelandman
> l: https://www.linkedin.com/in/joelandman
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20170817/f33a76b1/attachment-0001.html>