[Beowulf] very low performance for very small packets under MPICH(TCP_NODELAY?)

Douglas Eadline deadline at clustermonkey.net
Thu Dec 29 17:56:31 PST 2005

Have you checked to see if your Ethernet driver has any parameters that
can be tuned? The Intel drivers have lots of setting that can influence
performance (and the default settings are normally not good for small
packet latency).

You may also want to try an other MPI and see if you can duplicate the
data as well.


> [cross-posted to comp.parallel.mpi]
> We have a Beowulf class cluster built on Linux Fedora Core 3 (kernel
> 2.6.15) with MPI 1.2.7 and Gigabit ethernet with a 3COM Switch and
> 3C2000-T NIC cards. We detected a very low efficiency in communication
> for very small packets (shorter than 16~bytes). The symptoms are the
> same as for the problem reported on
> http://www.icase.edu/coral/LinuxTCP.html. Monitoring the time needed
> to send 1000 packets of 8~bytes long, we see a distribution of times
> very similar to those shown in the reference, i.e. most of them have
> times below 1e-4secs (near to Ethernet+TCP latency) *BUT* 1 of each 30
> packets or so times are in the order of 0.03~secs. This degrades the
> average performance for very small packets by a factor of 100.
> It seems that this is a well known issue for old kernels. For 2.0.x
> and 2.2.x a patch is provided in the link above. I didn't found any
> references to these problems for the new kernels and MPICH releases. I
> think that, supposedly, this was fixed in MPICH by desabling the Nagle
> algorithm by calls to `setsockopt(...TCP_NODELAY...)' .  This calls
> are activated in BSD systems and apparently in some SYSV systems like
> Linux. In our case I verified that the system is correctly detected as
> LINUX by the MPICH configure script, which in turn sets the
> `CAN_DO_SETSOCKOPT' flag in the P4 code, which activates the
> `setsockopt(...TCP_NODELAY...)' calls in
> ./mpid/ch_p4/p4/lib/p4_sock_util.c.
> Any pointers for understanding why the Nagle algorithm is still active
> for the MPI sockets or how to deactivate it will be helpful. Or either
> how to deactivate the nagle algorithm at the kernel TCP level.
> Mario
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf


More information about the Beowulf mailing list