[Beowulf] How to know if infiniband network works?
info at feacluster.com
Thu Aug 3 09:50:09 PDT 2017
Here is the result from the tcp and rdma tests. I take it to mean that
IB network is performing at the expected speed.
[hussaif1 at lustwzb5 ~]$ qperf lustwzb4 -t 30 tcp_lat tcp_bw
latency = 24.2 us
bw = 1.19 GB/sec
[hussaif1 at lustwzb5 ~]$ qperf lustwzb4 -t 30 rc_lat rc_bw
latency = 7.76 us
bw = 4.56 GB/sec
[hussaif1 at lustwzb5 ~]$
Quoting Jeff Johnson <jeff.johnson at aeoncomputing.com>:
> I didn't notice any tests where you actually tested the ip layer. You
> should run some iperf tests between nodes to make sure ipoib functions.
> Your infiniband/rdma can be working fine and ipoib can be dysfunctional.
> You need to ensure the ipoib configuration, like any ip environment, is
> configured the same on all nodes (network/subnet, netmask, mtu, etc) and
> that all of the nodes are configured for the same mode (connected vs
> datagram). If you can't run iperf then there is something broken in the
> ipoib configuration.
> On Thu, Aug 3, 2017 at 8:41 AM, Faraz Hussain <info at feacluster.com> wrote:
>> Thanks for everyone's help. Using the Ohio State tests, qperf and
>> perfquery I am convinced the IB network is working. The only thing that
>> still bothers me is I can not get mpirun to use the tcp network. I tried
>> all combinations of --mca btl to no avail. It is not important, more just
>> Quoting Michael Di Domenico <mdidomenico4 at gmail.com>:
>> On Thu, Aug 3, 2017 at 10:10 AM, Faraz Hussain <info at feacluster.com>
>>>> Thanks, I installed the MPI tests from Ohio State. I ran osu_bw and got
>>>> results below. What is confusing is I get the same result if I use tcp or
>>>> openib ( by doing --mca btl openib|tcp,self with my mpirun command ). I
>>>> tried changing the environment variable: export OMPI_MCA_btl=tcp,self,sm
>>>> Results are the same regardless of tcp or openib..
>>>> And when I do ifconfig -a I still see zero traffic reported for the ib0
>>>> ib1 network.
>>> if openmpi uses RDMA for the traffic ib0/ib1 will not show traffic,
>>> you have to use perfquery
>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>>> To change your subscription (digest mode or unsubscribe) visit
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
> Jeff Johnson
> Aeon Computing
> jeff.johnson at aeoncomputing.com
> t: 858-412-3810 x1001 f: 858-412-3845
> m: 619-204-9061
> 4170 Morena Boulevard, Suite D - San Diego, CA 92117
> High-Performance Computing / Lustre Filesystems / Scale-out Storage
More information about the Beowulf