Anyone have information on latest LSU beowulf?

Wed Oct 9 09:07:12 PDT 2002

> Hi Craig,
> 
> On Tue, 2002-10-08 at 12:54, Craig Tierney wrote:
> > >    What value of NB did they settle on ? (80 and 160 seem common choices)
> > >    any other non-default values in HPL.dat ?
> > 
> > Why are 80 and 160 common choices?  I do know that they used 160
> > for their run.  I also retested my setup at 160 and it is much
> > slower than 64.  I was told by someone at UTK that the size of
> > NB should be a multiple of the L1 cache and that double is good.
> > So NB = sqrt(8kb * 1024/8)=32 for P4 Xeon.  I tried 64 and that has 
> > been the best for a single node run.  
> 
> The block size (NB) should be a multiple of the optimal block size found
> by ATLAS. Look for this value in the DGEMM results in SUMMARY.LOG. This
> value is usually 40. Any multiple of this ATLAS block size is fine. 
> If NB is small, you will have a lot of communications but good load
> balancing. If NB is large, you have less coms but the grain is coarser.
> 160 (4*40) is a good trade-off for Myrinet cluster.

Patrick.  Here are some results for 256 cpus (128 dual nodes).

W01R2L6       115000    64    16    16            2007.22          5.051e+02
W01R2L6       115000    80    16    16            3026.14          3.351e+02
W01R2L6       115000   160    16    16            3020.05          3.357e+02

Here, 64 is much better than any multiple of 40.

I had some 500 cpu runs that showed the same thing (cannot find
the results).  

> 
> You can look at http://x-cat.org/docs/top500-HOWTO.html for some input.

This says that NB=40 is good for the PIII which has a larger
L1 cache than a P4 (16k data vs. 8k).  NB should be a multiple
of 32 for the P4.  I would like to try it out on a PIII, I would
think that 44 is a better value based on cache size.  I tried
all these tricks on an Alpha was 16k L1 cache and found 88 (44*2)
best.

I am going to kick off some runs with 500 processors with
NB=64,80,128 and 160 to see if it really makes a difference.  I will post 
the results later.

Craig

> 
> > I wonder if having more memory (1 GB vs. 2 GB per node) could
> > drastically improve scaling.  Anyone know?
> 
> I would thing so, less communications.
> 
> Patrick
> -- 
> ----------------------------------------------------------
> |   Patrick Geoffray, Ph.D.      patrick at myri.com 
> |   Myricom, Inc.                http://www.myri.com
> |   Cell:  865-389-8852          685 Emory Valley Rd (B)
> |   Phone: 626-821-5555          Oak Ridge, TN 37830
> ----------------------------------------------------------

-- 
Craig Tierney (ctierney at hpti.com)