[Beowulf] First experiences with Broadwell and the Dell M630
Bill Wichser
bill at princeton.edu
Thu Jun 9 18:26:37 PDT 2016
You'd think that by now I'd know better. Trying to live on the cutting
edge. But the promise of 5% over Haswell was quite alluring.
We purchased Broadwell 120W 2680v4 chips with 128G of RAM enclosed in
the Dell M630 blades. When we finally received power the first thing we
did was load a RHEL7 OS, checked BIOS to be sure we had all the
performance variables set and ran HPL compiled with Intel v16 compilers
against their MKL. Performance went from a high of 791 GFLOPS down to
679. A whopping 14% difference.
Test element low high % mean median std dev
WR00L2L4 315 679.44 791.34 14.14 741.67
739.64 19.89
We should be able to do better than this and reduce to something like 5%
right?
We checked power settings for the chassis and played with those. we
turned power management to BIOS and then to the OS using ACPI. No
difference. We swapped the fastest and slowest nodes thinking that this
might be a location issue. No difference. And then we found a BIOS
update from 2.0.1 to 2.1.6 which was fresh so loaded that one up.
Performance went down. Considerably.
Test chassis low high % mean median
std dev
WR00L2L4 296 583.47 636.27 8.30 623.74
625.15 9.82
Wow! 741 to 623 GFLOPS!
We then looked at power and heat using the turbostat command to log
values. What we found was that at the slowest nodes the SMI interrupts
and c1 states were higher and the power was capped at 120W. On the
fastest nodes, things were different with power hovering around 117W.
Again switching node slots changed nothing.
With Dell's help we finally manged to turn off turbo mode and set the
--ProcConfigTdp=Level1 to only run at the base AVX speed of 1.9GHz.
This indeed provided much closer HPL results.
Test chassis low high % mean median
std dev
WR00L2L4 310 515.95 519.55 0.69 519.07
519.14 0.34
with plenty of the nodes hovering around the 110W usage.
But now we had a new motherboard and with the same setup ran another
test, this time without updating BIOS so it still was sitting on 2.0.1.
Lo and behold, there's that (weak) performance again. 638 GFLOPS and
better power usage.
638.12, Temps: 78, 70, Watts: 107.02, 104.65
We still don't understand what we are up against here. Obviously
re-enabling the performance variables in BIOS will begin to get those
FLOPS up again. As will degrading the BIOS and microcode. And maybe
this 14% difference between best and worst nodes is all we can expect.
But I'd sure like to have a lot more of those better performers!
Did I mention GPFS? We have it running on a v3 node with the same
kernel. On the Broadwell chips though, it just hangs the kernel. Sigh.
The cutting edge. When can I order Skylake?
Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20160609/f77cda24/attachment.html>
More information about the Beowulf
mailing list