On Fri, Nov 26, 2004 at 11:29:29AM -0000, Kozin, I (Igor) wrote:

> What bothers me primarily is that you have to run a benchmark 
> many more times than usual to get the best performance on an Opteron.

We haven't observed this, but we do follow a few basic rules:

1) 2.6 kernels are better than 2.4 kernels at NUMA
2) Set your bios to have "node interleave off", to improve
   scaling at the potential cost of less uniformity.
3) Programs that use a lot of memory need to pre-run a
   program to touch a lot of memory; this pages out existing
   pages and lets the kernel balance everything as it comes
   back in. This is worth a few % even for serial SPECcpu runs, and
   is a must for multi-cpu things like SPECrate.
4) If you use more memory than is on 1 CPU with a single
   process, expect trouble.

There's nothing really new here, you've had to do similar things
on SGI machines, and big SMPs, for ages.

Following these rules, we've gotten excellent repeatability and
scaling on 2-cpu and 4-cpu boxes on a lot of codes, both MPI and

