[Beowulf] How to know if infiniband network works?
Faraz Hussain
info at feacluster.com
Wed Aug 2 10:50:06 PDT 2017
Thanks Joe. Here is the output from the commands you suggested. We
have open mpi built from Intel mpi compiler. Is there some benchmark
code I can compile so that we are all comparing the same code?
[hussaif1 at lustwzb4 test]$ ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.11.550
node_guid: f452:1403:0016:3b70
sys_image_guid: f452:1403:0016:3b73
vendor_id: 0x02c9
vendor_part_id: 4099
hw_ver: 0x0
board_id: DEL0A40000028
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 3
port_lmc: 0x00
link_layer: InfiniBand
port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
[hussaif1 at lustwzb4 test]$ ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.11.550
Hardware version: 0
Node GUID: 0xf452140300163b70
System image GUID: 0xf452140300163b73
Port 1:
State: Active
Physical state: LinkUp
Rate: 40 (FDR10)
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: 0xf452140300163b71
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02514868
Port GUID: 0xf452140300163b72
Link layer: InfiniBand
[hussaif1 at lustwzb4 test]$ ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:f452:1403:0016:3b71
base lid: 0x3
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X FDR10)
link_layer: InfiniBand
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:f452:1403:0016:3b72
base lid: 0x0
sm lid: 0x0
state: 1: DOWN
phys state: 3: Disabled
rate: 10 Gb/sec (4X)
link_layer: InfiniBand
Quoting Joe Landman <joe.landman at gmail.com>:
> start with
>
> ibv_devinfo
>
> ibstat
>
> ibstatus
>
>
> and see what (if anything) they report.
>
> Second, how did you compile/run your MPI code?
>
>
> On 08/02/2017 12:44 PM, Faraz Hussain wrote:
>> I have inherited a 20-node cluster that supposedly has an
>> infiniband network. I am testing some mpi applications and am
>> seeing no performance improvement with multiple nodes. So I am
>> wondering if the Infiband network even works?
>>
>> The output of ifconfig -a shows an ib0 and ib1 network. I ran
>> ethtools ib0 and it shows:
>>
>> Speed: 40000Mb/s
>> Link detected: no
>>
>> and for ib1 it show:
>>
>> Speed: 10000Mb/s
>> Link detected: no
>>
>> I am assuming this means it is down? Any idea how to debug further
>> and restart it?
>>
>> Thanks!
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Joe Landman
> e: joe.landman at gmail.com
> t: @hpcjoe
> w: https://scalability.org
> g: https://github.com/joelandman
> l: https://www.linkedin.com/in/joelandman
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list