[Beowulf] [EXTERNAL] Re: AMD and AVX512
Lux, Jim (US 7140)
james.p.lux at jpl.nasa.gov
Wed Jun 23 00:11:57 UTC 2021
From: Beowulf <beowulf-bounces at beowulf.org> on behalf of Joe Landman <joe.landman at gmail.com>
Date: Monday, June 21, 2021 at 6:46 AM
To: Jonathan Engwall <engwalljonathanthereal at gmail.com>
Cc: "beowulf at beowulf.org" <beowulf at beowulf.org>
Subject: [EXTERNAL] Re: [Beowulf] AMD and AVX512
On 6/21/21 9:20 AM, Jonathan Engwall wrote:
I have followed this thinking "square peg, round hole."
You have got it again, Joe. Compilers are your problem.
<snip discussion of architecture>
To date, I don’t know that *compilers* pay much attention to things like IO (that’s buried in some library call no doubt).
>>Maybe, someday, we'll get a great HPC compiler for C/Fortran.
Wasn’t the Fortran compiler for the 7600 highly optimized? Did vector unrolling and all that. And those compilers for the FPS boxes?
I think you mean great HPC compilers for chips that are available and fast <grin>
I think, too that the comments about ARM vs x86 vs whatever are interesting.
We’ve moved a long way from clusters where the ethernet interconnect was rate limiting, and the nodes were single core, single memory, single disk (if any). When you start getting into processors with hundreds of cores, or you start looking at “nanojoules/instruction” (or is instruction even the right thing to be counting.. maybe it’s nanojoules/data operation – where that could be a read/write from memory, disk, or interprocessor link).
Look at the (probably) specious claim that Tesla has the 5th fastest supercomputer - articles are very light on details, but I think it’s a whole bunch of GPUs – but their “number of cores” isn’t very big compared to even #100 on the “Top 500” list.
However, it might well be that for Tesla’s specific processing load, that 5000 GPU cores *is* faster than most Top 500 clustes.
And, given the recent news about miners consuming all those joules – maybe our metrics should be looking at more than raw speed.
(who has not just 1, but TWO, ARM based clusters on the shelf behind his desk.. Yes, Beaglebones, but it’s an ARM, it’s 4 nodes, and I use various cluster tools to manipulate them – the connection fabric for one is kind of slow (802.11))
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf