[Beowulf] How to know if infiniband network works?

Faraz Hussain info at feacluster.com
Thu Aug 3 07:10:45 PDT 2017


Thanks, I installed the MPI tests from Ohio State. I ran osu_bw and  
got the results below. What is confusing is I get the same result if I  
use tcp or openib ( by doing --mca btl openib|tcp,self with my mpirun  
command ). I also tried changing the environment variable: export  
OMPI_MCA_btl=tcp,self,sm . Results are the same regardless of tcp or  
openib..

And when I do ifconfig -a I still see zero traffic reported for the  
ib0 and ib1 network.

# OSU MPI Bandwidth Test v5.3.2
# Size      Bandwidth (MB/s)
1                       1.23
2                       6.55
4                      12.83
8                      25.42
16                     49.35
32                    101.99
64                    190.78
128                   362.64
256                   712.64
512                   576.00
1024                 2410.36
2048                 3548.19
4096                 3427.19
8192                 4259.77
16384                4399.37
32768                4566.43
65536                4617.49
131072               4682.98
262144               4690.70
524288               4701.48
1048576              4697.40
2097152              4706.88
4194304              4710.76


Quoting tegner at renget.se:

> I often use
> mpirun --np 2 --machinefile mpd.hosts mpitests-osu_latency
> mpirun --np 2 --machinefile mpd.hosts mpitests-osu_bw
> To test bandwidth and latency between to specific nodes (listed in  
> mpd.hosts). On a CentOS/Redhat system these can be installed from  
> the package mpitests-openmpi.
>
> /jon
>
>
> On 2 August 2017 at 18:44:17 +02:00, Faraz Hussain  
> <info at feacluster.com> wrote:
>
>> I have inherited a 20-node cluster that supposedly has an  
>> infiniband network. I am testing some mpi applications and am  
>> seeing no performance improvement with multiple nodes. So I am  
>> wondering if the Infiband network even works?
>>
>> The output of ifconfig -a shows an ib0 and ib1 network. I ran  
>> ethtools ib0 and it shows:
>>
>> Speed: 40000Mb/s
>> Link detected: no
>>
>> and for ib1 it show:
>>
>> Speed: 10000Mb/s
>> Link detected: no
>>
>> I am assuming this means it is down? Any idea how to debug further  
>> and restart it?
>>
>> Thanks!
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> <http://www.beowulf.org/mailman/listinfo/beowulf>
>>





More information about the Beowulf mailing list