[Beowulf] Help with inconsistent network performance
Mark Hahn
hahn at mcmaster.ca
Tue Dec 18 21:55:51 PST 2007
> I guess I figured that the data is relatively small compared to the
> bandwidth,
I agree, in principle. and relatively small compared to the amount of ram
in the switch as well.
> whereas the latency for ethernet is relatively high. I also
not _that_ high, though. with a little tuning (coalesce parameters),
I think 30-40 us half-rtt is pretty common, even over a normal
tcp stack. yes, that's 2+ 1.5k packets, but it not _that_ much
compared to 1M images.
>> To make sure there was not an issue with the MPI broadcast, I did one test
>>> run with 5 nodes only sending back 4 bytes of data each. The result was
>> a
>>> RTT of less than 0.3 ms.
>>
>> isn't that kind of high? a single ping-pong latency should be ~50 us -
>> maybe I'm underestimating the latency of the broadcast itself.
>
>
> This is quite a bit more than a single ping-pong. The viewer sends to the
> master node (rank 0), and then the master node broadcasts to all other
> nodes, and then all nodes send back to the viewer node. I don't know if
> this is still seems high?
the first message should take <50 us. the broadcast to 5 nodes should
take 2-3 more 50 us times. so at about 200 us, all the slaves will start
the DOS attack on the viewer node's nic...
> But the bcast is always just sending 4 bytes (a single integer), and as
no, afaik no mpi implementations actually utilize the eth-level bcast,
but rather implement bcast as a tree of (uni) sends.
More information about the Beowulf
mailing list