[Beowulf] Re: dual core Opteron performance - re suse 9.3

Mark Hahn hahn at physics.mcmaster.ca
Tue Jul 12 14:40:58 PDT 2005

> >there are only 4 slots on the Tyan 2875 (I had mistakenly reported yesterday 
> I'm not seeing anywhere at Tyan an indication this board can take advantage
> of NUMA.

node interleave is meaningless for the 2875, since the board only has 
memory attached to one CPU.  while the bios probably does include the 
ACPI table that informs the kernel's k8-numa code, it's moot, since 
there's no way to arrange cpu-proc affinity to minimize non-local 
accesses.  (except by not using the second socket, of course!)

I'd expect NUMA support to make more of a difference on 4-socket systems,
since on them, a process can be >1 hop away from memory.  on a 2-socket
system, it's probably still worth doing, but can't be all that critical.

naturally, latency-sensitive codes (big but with poor locality) will
show a bigger difference.

> >Bank interleaving "Auto"

I tried to measure this on a dual, and couldn't.  it's hard to see,
based on the low-level hardware specs, why it would matter much.
yes, bank interleave should reduce the amount of time waiting on 
bank misses, but it's certainly not visible to Stream.

> >Node interleaving "Auto"

turning this to on essentially defeats NUMA; it could be the right thing
for some codes/systems, since it means that no process has any special 
affinity for a particular socket.

