[Beowulf] Good IB network performance when using 7 cores, poor performance on all 8?

Brian Dobbins bdobbins at gmail.com
Thu May 1 10:35:20 PDT 2014

Hi everyone,

 In case anyone else has seen similar problems, the latest is we've shown
the problem exists not just for IB, but also Gbit networks, and the current
theory is that it has to do with latencies introduced by the kernel
scheduler --  a theory I'm hoping to test over the next day or two once I'm
given root access to a set of nodes to play with.  Basically, this appears
to be a case of high jitter on the nodes impacting network performance, as
seen in this table:

TestNodes       50%             90%             97%           99%
C5.5-IB        180.7950       189.3320      197.9239      226.4684
RH62a-IB     219.0050       800.7130    2049.0532     3329.3111
RH62b-IB     258.0000       306.6360    2614.8770     4996.9770
C5.5-GB     5995.0000     6388.9800    6630.9400     6981.4500
RH6.2-GB   4689.6000    21746.7100  29692.5000   32588.4800

  [Apologies for the poor formatting!  Gmail doesn't do fixed-width fonts!]

  The take-away from this is that the CentOS5.5 nodes (C5.5) show
relatively low jitter in their timings.  These results are from running
2000 runs of a 64K Allreduce on 8 nodes / 64 cores via the Intel MPI
Benchmarks (v3.2.4), and using R's 'quantile' command to get the statistics
on the timings.  If anyone wants the full script I can send it to them.  I
removed the 10% column just to simplify the horrid formatting, but
basically the 10% and 50% don't differ much, meaning that half the runs in
*any* configuration are close to the minimum -- that is, they behave
perfectly fine.  For some of the tests, that's even true for 90% of the
runs.  But the CentOS5.5 runs are good up to the 99% quantile, whereas the
others balloon up above 90%.  (The difference between the two RH6.2 tests
is likely largely due to the different hardware on those clusters, ... and
in part probably just because I need to randomize test times, etc., to be
more fair statistically.)

  I believe the original CentOS5.5 kernel (2.6.18) was upgraded to
2.6.32-400.34.4uek (Oracle Linux?  I'll check with our IT guys), which is
pretty close to the 2.6.32-358 on the RHEL nodes, *but* the scheduler
features enabled in both systems are very different.  I don't have a list
of what's turned on or off yet, but will once I have root and can mount the
debug info.  If anyone else has seen this and solved it via kernel tweaks,
though, I'd be interested in knowing what settings you used.  If not, and
you're running a stock RHEL6.x 2.6.32 kernel, ... well, I'd be interested
to hear if you see any similar jitter.  For jobs with a high
communication:computation ratio, it makes a pretty big difference -- we've
seen applications run at 25-30% of previous speeds on the RH6 clusters due
to this issue.  For jobs with much less frequent communication, it doesn't
matter as much, obviously.

  I'll post an update once I'm able to play with
/proc/sys/kernel/sched_features a bit.

  - Brian

(PS.  Most testing was done with the Intel 2013.3.163 compilers and OpenMPI
1.6.4, though I did also do one or two tests with MVAPICH and OpenMPI 1.8)

On Sat, Apr 26, 2014 at 10:36 PM, Chris Samuel <samuel at unimelb.edu.au>wrote:

> On Thu, 24 Apr 2014 12:24:23 PM Joe Landman wrote:
> > Yeah, that was the other thing I'd forgotten about.  Might want to tweak
> > Cstates to C1.  It could be going over-power and throttling.  Turn of
> > ASPM and other (fairly painful) things.
> This Mellanox document has some useful information on that (and other
> settings
> that can affect performance).
> http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf
> cheers,
> Chris
> --
>  Christopher Samuel        Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/      http://twitter.com/vlsci
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20140501/7522182d/attachment.html>

More information about the Beowulf mailing list