[Beowulf] Broadwell HPL performance

Roland Fehrenbacher rf at q-leap.de
Thu Apr 21 08:46:18 PDT 2016


>>>>> "J" == John Hearns <hearnsj at googlemail.com> writes:

    J> I would be grateful for pointers towards HPL performance figures
    J> for Broadwell (v4) processors.

    J> I ask as I am getting some very good values and I want to do a
    J> sanity check!

Here are some numbers I just got for the 2630v4 on a Supermicro box with
latest MKL SMP Linpack / latest 3.12 kernel / HT off (note that german C't
http://www.heise.de/newsticker/meldung/Intel-bringt-naechste-Serverprozessorgeneration-Broadwell-EP-3159857.html
[sorry, it's in German] claims that MPI Linpack adds approx. another 5% for the
E5-2699v4. Didn't have time to verify yet).

We just have 64GB in the node, so max array size is 90000. Nominal peak
is 2.2*20*16 = 704, so we're getting a 98% efficiency as calculated from
nominal clock (2.2). The high efficiency comes from the fact that the
cores can stabilize turbo frequencies above nominal clock. The turbo
boost is more pronounced the less cores you have (less heat).

Untuned runs with the latest 4.4 kernel show a couple of percent less
performance. Haven't fine-tuned all the power-saving knobs there.

$ srun -N1 -n1 --cpus-per-task=20 --exclusive ./xlinpack_xeon64 -i ./lininput_xeon64
cpu_bind=NULL - beo-01, task  0  0 [5110]: mask 0xfffff
Intel(R) Optimized LINPACK Benchmark data

Current date/time: Thu Apr 21 16:57:14 2016

CPU frequency:    2.199 GHz
Number of CPUs: 2
Number of cores: 20
Number of threads: 20

Parameters are set to:

Number of tests: 1
Number of equations to solve (problem size) : 90000
Leading dimension of array                  : 90000
Number of trials to run                     : 4
Data alignment value (in Kbytes)            : 4

Maximum memory requested that can be used=64801804096, at the size=90000

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
90000  90000  4      699.126    695.1769 6.759477e-09 2.983269e-02   pass

$ cat ./lininput_xeon64
Sample Intel(R) Optimized LINPACK Benchmark data file (lininput_xeon64)
Intel(R) Optimized LINPACK Benchmark data
1                     # number of tests
90000 # problem sizes
90000 # leading dimensions
4 2 2 2 2 2 2 2 2 2 1 1 1 1 1 # times to run a test
4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 # alignment values (in KBytes)

Cheers,

Roland

-------
http://www.q-leap.com / http://qlustar.com
          --- HPC / Storage / Cloud Linux Cluster OS ---


More information about the Beowulf mailing list