[Beowulf] Here we go again

Thu Dec 12 08:29:38 PST 2019

On Thu, 12 Dec 2019 08:06:03 -0700
Brian Dobbins <bdobbins at gmail.com> wrote:

> Hi Doug,
> 
>   I've seen pretty decent performance on the AMD processors, and was
> even told by AMD people to use the Intel compiler -- but when doing
> that, we specify the processor type (eg, AVX capabilities), and it
> works pretty well.  However, I don't have any experience using the
> MKL on them.
> 
>   That said, looking at the numbers, it's pretty interesting that
> there's roughly a factor of 2 from the AVX2 (OpenBLAS) -> AVX512
> (MKL) results on Intel, and with the two systems being relatively
> comparable with OpenBLAS (AVX2).  Then it's *roughly* a factor of 8
> going from the MKL on Intel to the MKL on AMD, and since AVX512 is 8
> x 64 floats, it seems it could just be it's not using any
> vectorization whatsoever on AMD... presumably because Intel claims
> they can't recognize the chip?  That said, I'd love to see the author
> try after setting:
> 
> MKL_ENABLE_INSTRUCTIONS=AVX2

This does not work on MKL+zen (it also indeed defaults to non-simd MKL).

But setting MKL_DEBUG_CPU_TYPE=5 has an effect (essentially turns off
cpu detect and sets it to type 5, which is AVX2).

As far as the compilers go they seem to run ok on AMD with some caveats:

 * a binary built with -xAVX2 will fail due intel cpu req.
 * a binary built with -march=core-avx2 works ok

YMMV,
 Peter K