[Beowulf] Broadwell HPL performance

Thu Apr 21 11:48:56 PDT 2016

Roland

Thanks, those are numbers I can understand and
trust a bit more than some of the web blogs.

--
Doug

>>>>>> "J" == John Hearns <hearnsj at googlemail.com> writes:
>
>     J> I would be grateful for pointers towards HPL performance figures
>     J> for Broadwell (v4) processors.
>
>     J> I ask as I am getting some very good values and I want to do a
>     J> sanity check!
>
> Here are some numbers I just got for the 2630v4 on a Supermicro box with
> latest MKL SMP Linpack / latest 3.12 kernel / HT off (note that german C't
> http://www.heise.de/newsticker/meldung/Intel-bringt-naechste-Serverprozessorgeneration-Broadwell-EP-3159857.html
> [sorry, it's in German] claims that MPI Linpack adds approx. another 5%
> for the
> E5-2699v4. Didn't have time to verify yet).
>
> We just have 64GB in the node, so max array size is 90000. Nominal peak
> is 2.2*20*16 = 704, so we're getting a 98% efficiency as calculated from
> nominal clock (2.2). The high efficiency comes from the fact that the
> cores can stabilize turbo frequencies above nominal clock. The turbo
> boost is more pronounced the less cores you have (less heat).
>
> Untuned runs with the latest 4.4 kernel show a couple of percent less
> performance. Haven't fine-tuned all the power-saving knobs there.
>
> $ srun -N1 -n1 --cpus-per-task=20 --exclusive ./xlinpack_xeon64 -i
> ./lininput_xeon64
> cpu_bind=NULL - beo-01, task  0  0 [5110]: mask 0xfffff
> Intel(R) Optimized LINPACK Benchmark data
>
> Current date/time: Thu Apr 21 16:57:14 2016
>
> CPU frequency:    2.199 GHz
> Number of CPUs: 2
> Number of cores: 20
> Number of threads: 20
>
> Parameters are set to:
>
> Number of tests: 1
> Number of equations to solve (problem size) : 90000
> Leading dimension of array                  : 90000
> Number of trials to run                     : 4
> Data alignment value (in Kbytes)            : 4
>
> Maximum memory requested that can be used=64801804096, at the size=90000
>
> =================== Timing linear equation system solver
> ===================
>
> Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
> 90000  90000  4      699.126    695.1769 6.759477e-09 2.983269e-02   pass
>
> $ cat ./lininput_xeon64
> Sample Intel(R) Optimized LINPACK Benchmark data file (lininput_xeon64)
> Intel(R) Optimized LINPACK Benchmark data
> 1                     # number of tests
> 90000 # problem sizes
> 90000 # leading dimensions
> 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1 # times to run a test
> 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1 # alignment values (in KBytes)
>
> Cheers,
>
> Roland
>
> -------
> http://www.q-leap.com / http://qlustar.com
>           --- HPC / Storage / Cloud Linux Cluster OS ---
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Mailscanner: Clean
>

-- 
Doug

-- 
Mailscanner: Clean