[Beowulf] HPL Benchmarking and Optimization

Wed Apr 2 12:23:34 PDT 2008

Ellis Wilson wrote:
> Currently I get these kind of numbers from tested
> computers using the 
> same environment (gentoo, fortran in gcc, hpl, all
> same compilation 
> options):
> 1 x Core2Duo (2.1ghz/core, 2gigs ram) - 2.3Gflops
> 1 x Athlon 64 3500+ (2.2ghz, 1gig ram) - 1.0Glops
> 4 x Core2Duo (2.1ghz/core for a total of 8 cores,
> 2gigs ram/node, 
> 100mbit Ethernet interconnect) - 6.7Gflops

Sorry to double post all, however, I realized my issue
involved running 
HPL on the reference library of BLAS that is generic
for every 
architecture and didn't want to waste anyones time. 
Giving Portage the 
benefit of the doubt, I had failed to check that it's
dependencies were 
best for HPL.  Following an install of ATLAS and
relinking to its 
libraries, I've gotten the following numbers:
1 x Athlon64 3500+ (2.2ghz, 1gig ram) - 3.6GFlops
1 x Phenom9600 Quadcore (2.3ghz/core, 2gigs ram) -
11.9GFlops

I'll likely try MKL soon for the Intel processors I'm
interested in.

The phenom9600 had previously only gotten 4.5 GFlops,
and when I tested 
it the second time I simply used the same environment
I had compiled for 
the athlon64.  Certainly compiling ATLAS native on the
phenom will 
increase the result, hopefully about 350% like with
the athlon64 (though 
I suspect things will be interesting due to bandwidth,
etc for quadcores).

Anyway, not to end the thread I still am wondering:

Do those of you who have professional installations or
even simply large 
setups that are unsure of the exact code which will be
run upon your 
cluster utilize compilation options such as -O3,
funroll-loops, 
-fomit-frame-pointer, etc?

Thanks,

Ellis

      ____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.  
http://tc.deals.yahoo.com/tc/blockbuster/text5.com