[Beowulf] Theoretical vs. Actual Performance

Thu Feb 22 07:42:51 PST 2018

There is a very nice and simple Max flops code that requires much less 
tuning than Linpack. It is described in pg 57 of:

Rahman "Intel® Xeon Phi™ Coprocessor Architecture and Tools"
https://link.springer.com/book/10.1007%2F978-1-4302-5927-5

An example Fortran code is here:
https://github.com/bkmgit/intel-xeon-phi-coprocessor-architecture-tools/tree/master/ch05

On 02/22/2018 05:16 PM, John Hearns via Beowulf wrote:
> Prentice, I echo what Joe says.
> When doing benchmarking with HPL or SPEC benchmarks, I would optimise 
> the BIOS settings to the highest degree I could.
> Switch off processor C) states
> As Joe says you need to look at what the OS is runnign in the 
> background. I would disable the Bright cluster manager daemon for instance.
> 
> 
> 85% of theoretical peak on an HPL run sounds reasonable to me and I 
> would get fogures in that ballpark.
> 
> For your AMDs I would start by choosing one system, no interconnect to 
> cloud the waters. See what you can get out of that.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 22 February 2018 at 15:45, Joe Landman <joe.landman at gmail.com 
> <mailto:joe.landman at gmail.com>> wrote:
> 
> 
> 
>     On 02/22/2018 09:37 AM, Prentice Bisbal wrote:
> 
>         Beowulfers,
> 
>         In your experience, how close does actual performance of your
>         processors match up to their theoretical performance? I'm
>         investigating a performances issue on some of my nodes. These
>         are older systems using AMD Opteron 6274 processors. I found
>         literature from AMD stating the theoretical performance of these
>         processors is 282 GFLOPS, and my LINPACK performance isn't
>         coming close to that (I get approximately ~33% of that).  The
>         number I often hear mentioned is actual performance should be
>         ~85%. of theoretical performance is that a realistic number your
>         experience?
> 
> 
>     85% makes the assumption that you have the systems configured in an
>     optimal manner, that the compiler doesn't do anything wonky, and
>     that, to some degree, you isolate the OS portion of the workload off
>     of most of the cores to reduce jitter.   Among other things.
> 
>     At Scalable, I'd regularly hit 60-90 % of theoretical max computing
>     performance, with progressively more heroic tuning.   Storage, I'd
>     typically hit 90-95% of theoretical max (good architectures almost
>     always beat bad ones).  Networking, fairly similar, though tuning
>     per use case mattered significantly.
> 
> 
>         I don't want this to be a discussion of what could be wrong at
>         this point, we will get to that in future posts, I assure you!
> 
> 
>     -- 
>     Joe Landman
>     t: @hpcjoe
>     w: https://scalability.org
> 
> 
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     http://www.beowulf.org/mailman/listinfo/beowulf
>     <http://www.beowulf.org/mailman/listinfo/beowulf>
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>