[Beowulf] HPL Benchmarking and Optimization
Ellis Wilson
xclski at yahoo.com
Wed Apr 2 12:23:34 PDT 2008
Ellis Wilson wrote:
> Currently I get these kind of numbers from tested
> computers using the
> same environment (gentoo, fortran in gcc, hpl, all
> same compilation
> options):
> 1 x Core2Duo (2.1ghz/core, 2gigs ram) - 2.3Gflops
> 1 x Athlon 64 3500+ (2.2ghz, 1gig ram) - 1.0Glops
> 4 x Core2Duo (2.1ghz/core for a total of 8 cores,
> 2gigs ram/node,
> 100mbit Ethernet interconnect) - 6.7Gflops
Sorry to double post all, however, I realized my issue
involved running
HPL on the reference library of BLAS that is generic
for every
architecture and didn't want to waste anyones time.
Giving Portage the
benefit of the doubt, I had failed to check that it's
dependencies were
best for HPL. Following an install of ATLAS and
relinking to its
libraries, I've gotten the following numbers:
1 x Athlon64 3500+ (2.2ghz, 1gig ram) - 3.6GFlops
1 x Phenom9600 Quadcore (2.3ghz/core, 2gigs ram) -
11.9GFlops
I'll likely try MKL soon for the Intel processors I'm
interested in.
The phenom9600 had previously only gotten 4.5 GFlops,
and when I tested
it the second time I simply used the same environment
I had compiled for
the athlon64. Certainly compiling ATLAS native on the
phenom will
increase the result, hopefully about 350% like with
the athlon64 (though
I suspect things will be interesting due to bandwidth,
etc for quadcores).
Anyway, not to end the thread I still am wondering:
Do those of you who have professional installations or
even simply large
setups that are unsure of the exact code which will be
run upon your
cluster utilize compilation options such as -O3,
funroll-loops,
-fomit-frame-pointer, etc?
Thanks,
Ellis
____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.
http://tc.deals.yahoo.com/tc/blockbuster/text5.com
More information about the Beowulf
mailing list