[Beowulf] HPL Benchmarking and Optimization
Ellis Wilson
xclski at yahoo.com
Fri Apr 4 16:48:18 PDT 2008
Tom Elken wrote:
>
>> -----Original Message-----
>> From: beowulf-bounces at beowulf.org
>> [mailto:beowulf-bounces at beowulf.org] On Behalf Of
Ellis Wilson
>>
>> I'll likely try MKL soon for the Intel processors
I'm
>> interested in.
>
> Good idea.
>
> You might also want to try "Goto BLAS" (Google that
to find the free
> download site). It can be compiled for a different
architecture a lot
> quicker than ATLAS, and provides very good
performance for both Intel
> and AMD architectures.
>
> As you may have already found, once you are using a
good BLAS library
> with HPL, various compilers or compiler options
won't make much
> difference in performance.
>
> -Tom
>
>
Hey Tom,
Thanks for your suggestions, I had already begun
testing Goto BLAS when
I got your email, and it has been thus far the most
beneficial one to my
particular application residing on a CD-ROM. MKL
proved to be far too
heavyweight (and I try to avoid closed source as often
as possible).
The only difficulties have come with the compilation
of Goto BLAS (or
anything, for that matter) on a static system such as
a LiveCD. As I do
not include in my LiveCD (in order to keep its total
size and initrd
loaded size down as low as possible) the portage tree,
nothing can be
emerged. This has required me to pursue a number of
solutions, the
first being to copy the full version of it directly
into the tmpfs from
a usb pen, uncompress it, chroot into that
environment, compile on the
architecture desired, exit the chroot, recompress, and
put it back on
the usb for later burning. Obviously, this requires a
ton of work, so I
came up with an easier fix that has interesting
repercussions I'd like
to hear from this list on:
An NFS directory is mounted onto my system, which I
chroot into, compile
Goto-BLAS or ATLAS upon, and exit the chroot. Since
the directory
remains on my development system (which does use a
harddrive) I have no
issues with running out of RAM, moving this there or
the other place,
etc. However, upon compiling Goto-BLAS on an older P4
without HT and
with 256MB RAM, it reported warnings due to
"clock-skew" I've never seen
previously. Is this due to the NFS mount? And if so,
will it hurt my
optimization of Goto BLAS or ATLAS? I still achieved
4GFlops on the P4
I had used that methodology upon, which was way above
my previous
findings using the reference library (obviously), but
I still have my
concerns that better optimization might occur with
local compilation.
Anyone think thats true/false?
Thanks,
Ellis
____________________________________________________________________________________
You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost.
http://tc.deals.yahoo.com/tc/blockbuster/text5.com
More information about the Beowulf
mailing list