> >    What value of NB did they settle on ? (80 and 160 seem common choices)
> >    any other non-default values in HPL.dat ?
> Why are 80 and 160 common choices?  I do know that they used 160
> for their run.  I also retested my setup at 160 and it is much
> slower than 64.  I was told by someone at UTK that the size of
> NB should be a multiple of the L1 cache and that double is good.
> So NB = sqrt(8kb * 1024/8)=32 for P4 Xeon.  I tried 64 and that has 
> been the best for a single node run.  

The block size (NB) should be a multiple of the optimal block size found
by ATLAS. Look for this value in the DGEMM results in SUMMARY.LOG. This
value is usually 40. Any multiple of this ATLAS block size is fine. 
If NB is small, you will have a lot of communications but good load
balancing. If NB is large, you have less coms but the grain is coarser.
160 (4*40) is a good trade-off for Myrinet cluster.

You can look at for some input.

> I wonder if having more memory (1 GB vs. 2 GB per node) could
> drastically improve scaling.  Anyone know?

I would thing so, less communications.

