[Beowulf] How to know if infiniband network works?

Faraz Hussain info at feacluster.com
Thu Aug 3 11:58:33 PDT 2017


Here are the latency numbers when running the Ohio State test:

mpirun -np 2 -machinefile hostfile ./osu_latency

# OSU MPI Latency Test v5.3.2
# Size          Latency (us)
0                       1.57
1                       1.22
2                       1.19
4                       1.20
8                       1.17
16                      1.20
32                      1.23
64                      1.29
128                     1.42
256                     1.76
512                     2.07
1024                    2.62
2048                    3.63
4096                    4.65
8192                    6.46
16384                  10.34
32768                  13.37
65536                  19.03
131072                 33.04
262144                 61.70
524288                119.93
1048576               231.21
2097152               455.84
4194304               907.89


Quoting Jon Tegner <tegner at renget.se>:

> Isn't latency over RDMA a bit high? When I've tested QDR and FDR I  
> tend to see around 1 us (using mpitests-osu_latency) between two  
> nodes.
>
> /jon
>
> On 08/03/2017 06:50 PM, Faraz Hussain wrote:
>> Here is the result from the tcp and rdma tests. I take it to mean  
>> that IB network is performing at the expected speed.
>>
>> [hussaif1 at lustwzb5 ~]$ qperf lustwzb4 -t 30 tcp_lat tcp_bw
>> tcp_lat:
>>    latency  =  24.2 us
>> tcp_bw:
>>    bw  =  1.19 GB/sec
>> [hussaif1 at lustwzb5 ~]$ qperf lustwzb4 -t 30 rc_lat rc_bw
>> rc_lat:
>>    latency  =  7.76 us
>> rc_bw:
>>    bw  =  4.56 GB/sec
>> [hussaif1 at lustwzb5 ~]$
>>
>>
>> Quoting Jeff Johnson <jeff.johnson at aeoncomputing.com>:
>>
>>> Faraz,
>>>
>>> I didn't notice any tests where you actually tested the ip layer. You
>>> should run some iperf tests between nodes to make sure ipoib functions.
>>> Your infiniband/rdma can be working fine and ipoib can be dysfunctional.
>>> You need to ensure the ipoib configuration, like any ip environment, is
>>> configured the same on all nodes (network/subnet, netmask, mtu, etc) and
>>> that all of the nodes are configured for the same mode (connected vs
>>> datagram). If you can't run iperf then there is something broken in the
>>> ipoib configuration.
>>>
>>> --Jeff
>>>
>>> On Thu, Aug 3, 2017 at 8:41 AM, Faraz Hussain <info at feacluster.com> wrote:
>>>
>>>> Thanks for everyone's help. Using the Ohio State tests, qperf and
>>>> perfquery I am convinced the IB network is working. The only thing that
>>>> still bothers me is I can not get mpirun to use the tcp network. I tried
>>>> all combinations of --mca btl to no avail. It is not important, more just
>>>> curiosity.
>>>>
>>>>
>>>>
>>>> Quoting Michael Di Domenico <mdidomenico4 at gmail.com>:
>>>>
>>>> On Thu, Aug 3, 2017 at 10:10 AM, Faraz Hussain <info at feacluster.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks, I installed the MPI tests from Ohio State. I ran osu_bw and got
>>>>>> the
>>>>>> results below. What is confusing is I get the same result if I  
>>>>>> use tcp or
>>>>>> openib ( by doing --mca btl openib|tcp,self with my mpirun command ). I
>>>>>> also
>>>>>> tried changing the environment variable: export OMPI_MCA_btl=tcp,self,sm
>>>>>> .
>>>>>> Results are the same regardless of tcp or openib..
>>>>>>
>>>>>> And when I do ifconfig -a I still see zero traffic reported for the ib0
>>>>>> and
>>>>>> ib1 network.
>>>>>>
>>>>>
>>>>> if openmpi uses RDMA for the traffic ib0/ib1 will not show traffic,
>>>>> you have to use perfquery
>>>>> _______________________________________________
>>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>>> To change your subscription (digest mode or unsubscribe) visit
>>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>>
>>>
>>>
>>>
>>> -- 
>>> ------------------------------
>>> Jeff Johnson
>>> Co-Founder
>>> Aeon Computing
>>>
>>> jeff.johnson at aeoncomputing.com
>>> www.aeoncomputing.com
>>> t: 858-412-3810 x1001   f: 858-412-3845
>>> m: 619-204-9061
>>>
>>> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
>>>
>>> High-Performance Computing / Lustre Filesystems / Scale-out Storage
>>
>>
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list