[Beowulf] Again about NUMA (numactl and taskset)

Bill Broadley bill at cse.ucdavis.edu
Tue Jun 24 23:05:10 PDT 2008


Vincent Diepeveen wrote:
> intel c++ obviously is close to visual studio. Within 0.5% to 1.5% range 
> (depending upon flags

I believe Microsoft licensed the intel optimization technology, so the
similarity is hardly surprising.

> and hidden flags that you managed to get from someone). Intel C++ is 
> free for researchers such as
> me.

Last I checked it was fine for research in compilers, but as a tool to help 
facilitate research you have to pay, even in an academic environment.  Maybe I 
misread the license, what exactly let you to believe that it's "free for 
researchers"?

> The PG compiler and especially pathscale compiler are doing rather well 
> at benchmarks,

In my experience, real applications as well, things like nwchem for instance. 
  Granted the difference (at least in the past) was much larger for g77 vs 
commercial fortran than it was for gcc/g++ vs commercial c compilers.  I've
heard gfortran has gotten much better and I've even occasionally seen SPEC
numbers for GNU compilers which looked encouraging.  In any case I've often 
seen commercial compilers justified on a pure cost/benefit basis.  I.e. $1k on 
compilers gets me more performance than another $1k on hardware.

> especially that last, yet at our codes they're real ugly. Maybe they do 
> better for floating point
> oriented workloads, which doesn't describe game tree search.

I've written codes that do mostly pointer chasing, compilers didn't make any
difference whatsoever.  gcc -O1 matched any optimization flag of any compiler 
I tried (pathcc, pgcc, icc).

> What strikes me is that visual studio/intel c++ produce a far tinier 
> executable than either of those compilers
> and that the fastest executable (in case of Diep) is a 32 bits one. 32 
> bits visual c++ executable with pgo
> is roughly 1.5% faster than its 64 bits counterpart.

Ah, I don't find 1.5% very exciting, nor the size of the binary.  Certainly if
you have many pointers and are close to falling into the next level of the
memory hierarchy smaller pointers can by a significant help.  Did you compare
stripped binaries with similar flags?  Where was the difference? /usr/bin/size 
(on most linux distros) can be handy for looking at different section sizes of 
a binary.

> The main datastructures are using integers and a lot of pointers get 
> used. Seems in 64 bits this hurts a tad.

The main differences I know of are:
* 64 bit pointers can cause more cache/memory pressure for pointer intensive
   codes.
* I believe on most (all?) core2 chips some optimizations are disabled in 64
   bit mode (nehalem is supposed to fix this).  I don't believe AMD disables
   any optimizations in 32 bit mode.  Thus the performance difference for
   AMD vs Intel can depend on which mode you use, a slight advantage for intel
   at 32 bit, and a slight advantage for amd at 64 bit.  This varies hugely
   depending on the code of course.
* Some codes benefit from the additional (8-> 16) registers only available in
   64 bit mode, which takes some pressure off of the L1, and creates
   opportunities to keep more functional units busy for 2 reasons.  Instruction
   issue is less of an issue (you can have more register references than
   pending loads) and there are more read/write ports to the register file than
   L1.



More information about the Beowulf mailing list