[Beowulf] How to know if infiniband network works?

Faraz Hussain info at feacluster.com
Wed Aug 2 10:50:06 PDT 2017


Thanks Joe. Here is the output from the commands you suggested. We  
have open mpi built from Intel mpi compiler. Is there some benchmark  
code I can compile so that we are all comparing the same code?

[hussaif1 at lustwzb4 test]$ ibv_devinfo
hca_id: mlx4_0
         transport:                      InfiniBand (0)
         fw_ver:                         2.11.550
         node_guid:                      f452:1403:0016:3b70
         sys_image_guid:                 f452:1403:0016:3b73
         vendor_id:                      0x02c9
         vendor_part_id:                 4099
         hw_ver:                         0x0
         board_id:                       DEL0A40000028
         phys_port_cnt:                  2
                 port:   1
                         state:                  PORT_ACTIVE (4)
                         max_mtu:                4096 (5)
                         active_mtu:             4096 (5)
                         sm_lid:                 1
                         port_lid:               3
                         port_lmc:               0x00
                         link_layer:             InfiniBand

                 port:   2
                         state:                  PORT_DOWN (1)
                         max_mtu:                4096 (5)
                         active_mtu:             4096 (5)
                         sm_lid:                 0
                         port_lid:               0
                         port_lmc:               0x00
                         link_layer:             InfiniBand

[hussaif1 at lustwzb4 test]$ ibstat
CA 'mlx4_0'
         CA type: MT4099
         Number of ports: 2
         Firmware version: 2.11.550
         Hardware version: 0
         Node GUID: 0xf452140300163b70
         System image GUID: 0xf452140300163b73
         Port 1:
                 State: Active
                 Physical state: LinkUp
                 Rate: 40 (FDR10)
                 Base lid: 3
                 LMC: 0
                 SM lid: 1
                 Capability mask: 0x02514868
                 Port GUID: 0xf452140300163b71
                 Link layer: InfiniBand
         Port 2:
                 State: Down
                 Physical state: Disabled
                 Rate: 10
                 Base lid: 0
                 LMC: 0
                 SM lid: 0
                 Capability mask: 0x02514868
                 Port GUID: 0xf452140300163b72
                 Link layer: InfiniBand

[hussaif1 at lustwzb4 test]$ ibstatus
Infiniband device 'mlx4_0' port 1 status:
         default gid:     fe80:0000:0000:0000:f452:1403:0016:3b71
         base lid:        0x3
         sm lid:          0x1
         state:           4: ACTIVE
         phys state:      5: LinkUp
         rate:            40 Gb/sec (4X FDR10)
         link_layer:      InfiniBand

Infiniband device 'mlx4_0' port 2 status:
         default gid:     fe80:0000:0000:0000:f452:1403:0016:3b72
         base lid:        0x0
         sm lid:          0x0
         state:           1: DOWN
         phys state:      3: Disabled
         rate:            10 Gb/sec (4X)
         link_layer:      InfiniBand


Quoting Joe Landman <joe.landman at gmail.com>:

> start with
>
>     ibv_devinfo
>
>     ibstat
>
>     ibstatus
>
>
> and see what (if anything) they report.
>
> Second, how did you compile/run your MPI code?
>
>
> On 08/02/2017 12:44 PM, Faraz Hussain wrote:
>> I have inherited a 20-node cluster that supposedly has an  
>> infiniband network. I am testing some mpi applications and am  
>> seeing no performance improvement with multiple nodes. So I am  
>> wondering if the Infiband network even works?
>>
>> The output of ifconfig -a shows an ib0 and ib1 network. I ran  
>> ethtools ib0 and it shows:
>>
>>        Speed: 40000Mb/s
>>        Link detected: no
>>
>> and for ib1 it show:
>>
>>        Speed: 10000Mb/s
>>        Link detected: no
>>
>> I am assuming this means it is down? Any idea how to debug further  
>> and restart it?
>>
>> Thanks!
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> -- 
> Joe Landman
> e: joe.landman at gmail.com
> t: @hpcjoe
> w: https://scalability.org
> g: https://github.com/joelandman
> l: https://www.linkedin.com/in/joelandman
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list