Anyone have information on latest LSU beowulf?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Craig Tierney ctierney at hpti.comWed Oct 9 09:07:12 PDT 2002
- Previous message: Anyone have information on latest LSU beowulf?
- Next message: Anyone have information on latest LSU beowulf?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Hi Craig, > > On Tue, 2002-10-08 at 12:54, Craig Tierney wrote: > > > What value of NB did they settle on ? (80 and 160 seem common choices) > > > any other non-default values in HPL.dat ? > > > > Why are 80 and 160 common choices? I do know that they used 160 > > for their run. I also retested my setup at 160 and it is much > > slower than 64. I was told by someone at UTK that the size of > > NB should be a multiple of the L1 cache and that double is good. > > So NB = sqrt(8kb * 1024/8)=32 for P4 Xeon. I tried 64 and that has > > been the best for a single node run. > > The block size (NB) should be a multiple of the optimal block size found > by ATLAS. Look for this value in the DGEMM results in SUMMARY.LOG. This > value is usually 40. Any multiple of this ATLAS block size is fine. > If NB is small, you will have a lot of communications but good load > balancing. If NB is large, you have less coms but the grain is coarser. > 160 (4*40) is a good trade-off for Myrinet cluster. Patrick. Here are some results for 256 cpus (128 dual nodes). W01R2L6 115000 64 16 16 2007.22 5.051e+02 W01R2L6 115000 80 16 16 3026.14 3.351e+02 W01R2L6 115000 160 16 16 3020.05 3.357e+02 Here, 64 is much better than any multiple of 40. I had some 500 cpu runs that showed the same thing (cannot find the results). > > You can look at http://x-cat.org/docs/top500-HOWTO.html for some input. This says that NB=40 is good for the PIII which has a larger L1 cache than a P4 (16k data vs. 8k). NB should be a multiple of 32 for the P4. I would like to try it out on a PIII, I would think that 44 is a better value based on cache size. I tried all these tricks on an Alpha was 16k L1 cache and found 88 (44*2) best. I am going to kick off some runs with 500 processors with NB=64,80,128 and 160 to see if it really makes a difference. I will post the results later. Craig > > > I wonder if having more memory (1 GB vs. 2 GB per node) could > > drastically improve scaling. Anyone know? > > I would thing so, less communications. > > Patrick > -- > ---------------------------------------------------------- > | Patrick Geoffray, Ph.D. patrick at myri.com > | Myricom, Inc. http://www.myri.com > | Cell: 865-389-8852 685 Emory Valley Rd (B) > | Phone: 626-821-5555 Oak Ridge, TN 37830 > ---------------------------------------------------------- -- Craig Tierney (ctierney at hpti.com)
- Previous message: Anyone have information on latest LSU beowulf?
- Next message: Anyone have information on latest LSU beowulf?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
