[Beowulf] bring back 2012?

Wed Aug 17 07:37:21 PDT 2016

On 08/16/2016 09:18 PM, Stu Midgley wrote:

> oh, indeed the top bin Xeon systems are fast and damn expensive.  Even 
> when we purchased these AMD systems they were a LOT cheaper than any 
> Intel system that could come close in specfp rate performance - let 
> alone after 4 years of heavy use.
>
> One issue is that the AMD systems are far more numa, thus requiring 
> tighter programming.
>

I'm going to have to disagree with the above statement. Before the Intel 
Nehalem, the AMD Opterons were true NUMA processors, but the Intel 
processors were SMP designs (the *other* SMP - Symmetrical 
Multiprocessing). Since the Nehalem, Xeons have been NUMA processors, 
too, and I don't think it's accurate to say one design is any more NUMA 
than the other.

For symmetric multiprocessing, every read/write to main memory took the 
same amount of time, but that also meant the memory controller was a 
bottleneck, as it could only service one processor at a time. It's a 
resource contention issue. To improve performance, you'd have to do 
whatever you could to organize your code so that the various cores 
weren't all trying to access memory at the same time.

With NUMA, on the other hand, each processor has its own memory 
controller and can access it's own portion of memory very quickly. If if 
needs to access memory elsewhere, it takes longer because it has to ask 
another processor, or that other processors memory controller to perform 
the memory operation (I forget the exact low-level implementation 
details). Accessing a remote processor's memory takes longer than 
accessing your own memory, hence the name Non-uniform Memory Access. The 
advantage of this is that most of the time a processor is accessing it's 
local memory, so it can use it's own memory controller without resource 
contention most of the time. Sure, still want to organize your program 
to keep as many memory accesses local as you can, but I don't think 
that's much different than trying to keep as much data in the local 
caches as possible to prevent reads of main memory, which you should be 
doing regardless of whether your system is SMP or NUMA.

I would think SMP needs tighter programming, since you want to reduce 
contention for the memory controller as much as possible.

Prentice

PS - Yes, I know today's systems are actually a mixture of SMP and NUMA, 
depending on what level your looking at the architecture, so put the 
torches and pitchforks away!