[Beowulf] AMD and AVX512

Mon Jun 21 13:46:30 UTC 2021

On 6/21/21 9:20 AM, Jonathan Engwall wrote:
> I have followed this thinking "square peg, round hole."
> You have got it again, Joe. Compilers are your problem.

Erp ... did I mess up again?

System architecture has been a problem ... making a processing unit 
10-100x as fast as its support components means you have to code with 
that in mind.  A simple `gfortran -O3 mycode.f` won't necessarily 
generate optimal code for the system ( but I swear ... -O3 ... it says 
it on the package!)

Way back at Scalable, our secret sauce was largely increasing IO 
bandwidth and lowering IO latency while coupling computing more tightly 
to this massive IO/network pipe set, combined with intelligence in the 
kernel on how to better use the resources.  It was simply a better 
architecture.  We used the same CPUs.  We simply exploited the design 
better.

End result was codes that ran on our systems with off-cpu work (storage, 
networking, etc.) could push our systems far harder than competitors.  
And you didn't have to use a different ISA to get these benefits.  No 
recompilation needed, though we did show the folks who were interested, 
how to get even better performance.

Architecture matters, as does implementation of that architecture.  
There are costs to every decision within an architecture.  For AVX512, 
along comes lots of other baggage associated with downclocking, etc.  
You have to do a cost-benefit analysis on whether or not it is worth 
paying for that baggage, with the benefits you get from doing so.  Some 
folks have made that decision towards AVX512, and have been enjoying the 
benefits of doing so (e.g. willing to pay the costs).  For the general 
audience, these costs represent a (significant) hurdle one must overcome.

Here's where awesome compiler support would help.  FWIW, gcc isn't that 
great a compiler.  Its not performance minded for HPC. Its a reasonable 
general purpose standards compliant (for some subset of standards) 
compilation system.  LLVM is IMO a better compiler system, and its 
clang/flang are developing nicely, albeit still not really HPC focused.  
Then you have variants built on that.  Like the Cray compiler, Nvidia 
compiler and AMD compiler. These are HPC focused, and actually do quite 
well with some codes (though the AMD version lags the Cray and Nvidia 
compilers). You've got the Intel compiler, which would be a good general 
compiler if it wasn't more of a marketing vehicle for Intel processors 
and their features (hey you got an AMD chip?  you will take the slowest 
code path even if you support the features needed for the high 
performance code path).

Maybe, someday, we'll get a great HPC compiler for C/Fortran.

-- 
Joe Landman
e: joe.landman at gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20210621/7162ff2d/attachment-0001.htm>