Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Anyone have information on latest LSU beowulf?

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Craig Tierney ctierney at hpti.com
Wed Oct 9 09:07:12 PDT 2002


> Hi Craig,
> 
> On Tue, 2002-10-08 at 12:54, Craig Tierney wrote:
> > >    What value of NB did they settle on ? (80 and 160 seem common choices)
> > >    any other non-default values in HPL.dat ?
> > 
> > Why are 80 and 160 common choices?  I do know that they used 160
> > for their run.  I also retested my setup at 160 and it is much
> > slower than 64.  I was told by someone at UTK that the size of
> > NB should be a multiple of the L1 cache and that double is good.
> > So NB = sqrt(8kb * 1024/8)=32 for P4 Xeon.  I tried 64 and that has 
> > been the best for a single node run.  
> 
> The block size (NB) should be a multiple of the optimal block size found
> by ATLAS. Look for this value in the DGEMM results in SUMMARY.LOG. This
> value is usually 40. Any multiple of this ATLAS block size is fine. 
> If NB is small, you will have a lot of communications but good load
> balancing. If NB is large, you have less coms but the grain is coarser.
> 160 (4*40) is a good trade-off for Myrinet cluster.

Patrick.  Here are some results for 256 cpus (128 dual nodes).

W01R2L6       115000    64    16    16            2007.22          5.051e+02
W01R2L6       115000    80    16    16            3026.14          3.351e+02
W01R2L6       115000   160    16    16            3020.05          3.357e+02

Here, 64 is much better than any multiple of 40.

I had some 500 cpu runs that showed the same thing (cannot find
the results).  

> 
> You can look at http://x-cat.org/docs/top500-HOWTO.html for some input.

This says that NB=40 is good for the PIII which has a larger
L1 cache than a P4 (16k data vs. 8k).  NB should be a multiple
of 32 for the P4.  I would like to try it out on a PIII, I would
think that 44 is a better value based on cache size.  I tried
all these tricks on an Alpha was 16k L1 cache and found 88 (44*2)
best.

I am going to kick off some runs with 500 processors with
NB=64,80,128 and 160 to see if it really makes a difference.  I will post 
the results later.

Craig


> 
> > I wonder if having more memory (1 GB vs. 2 GB per node) could
> > drastically improve scaling.  Anyone know?
> 
> I would thing so, less communications.
> 
> Patrick
> -- 
> ----------------------------------------------------------
> |   Patrick Geoffray, Ph.D.      patrick at myri.com 
> |   Myricom, Inc.                http://www.myri.com
> |   Cell:  865-389-8852          685 Emory Valley Rd (B)
> |   Phone: 626-821-5555          Oak Ridge, TN 37830
> ----------------------------------------------------------

-- 
Craig Tierney (ctierney at hpti.com)



More information about the Beowulf mailing list