[Beowulf] Parallel Programming Question
Robert G. Brown
rgb at phy.duke.edu
Tue Jul 17 13:15:54 PDT 2007
On Tue, 17 Jul 2007, Joe Landman wrote:
>> (The functions are borrowed from the library provided by "Numerical Recipes
>> in C++")
>> I'm currently calling these functions from the main loop with the lines:
> ah.... that explains it...
Um, "borrowed"? You'd better watch out for the DMCA Police -- they'll
be knocking at your door. They'll announce themselves after kicking
through the lock with "Buddy, you got a license for that?"
Numerical Recipes is not, actually, terribly well thought of by Real
Computer Scientists. It wasn't written by real computer scientists, for
one thing. Its algorithms are in at least some cases stale. It is not
at all clear that the code in it is original, but nevertheless it is
copyrighted and the licensing is QUITE restrictive, prohibiting pretty
much any sort of use that you might care to think of unless you pay them
More Money. Finally, there significantly faster and better
alternatives, especially for linear algebra.
(this is one of the original rants against NR, and is quite incisive).
which summarizes the main criticism of the book both ways. Note that I
personally am not a mindless anti-NR ranter. I actually have some of
the early versions of the book with much more liberal licensing, and can
use at least THAT code legally pretty freely even moving it around and
putting it on multiple systems in electronic source format.
However, the right thing to do now is to use the Gnu Scientific Library
(GSL). Better code, aggressively open source (as in GPL v2 full
viral!). Or rather, DON't use it unless you are prepared to make your
application GPL v2 as well. Noting well that if you're trying to write
a closed source application that includes stolen source code from a
copyrighted library with restrictive licensing that way will get you
into even MORE trouble...;-)
The GSL can be used with an ATLAS-tuned linear algebra BLAS and very
likely will run 2 or more times faster than NR linear algebra anyway.
The "or more" is pretty open, by the way. Maybe as many as 10x faster
-- there are some pretty significant optimizations in ATLAS and the GSL
>> Is there a simple way to call these functions which will cause the cluster
>> to distribute the
>> load of operations? Currently, when I run the program with a 50x50 array
>> on a single machine,
>> it takes about 5 minutes to process a single iteration through the matrix
> Huh? What machine? What compiler? What options?
> (more lack of coffee induced stupor here)
> This should take well under 1 second. O(N**3) operations (50*50*50 =
> 0.125E+6 operations) on a 1 GFLOP machine ( 1E+9 FLOPS) should be achievable
> in well under 1 second (ball park of 0.125/1000 seconds times a constant)
> Maybe I am missing something. This *should* be fast. Unless you are running
> on a Palm pilot or something like that ...
> In octave, using an interpreted LU decomposition on a hilbert matrix
> a=time ; [l,u,p] = lu(h); b=time; b-a
> ans = 0.00063705
> this is ball-park what I had estimated, within a constant factor (cough).
> This is an Opteron 242 with 4 GB ram, so it isn't a speed demon...
And octave code isn't speed demon code. But yeah, one would expect to
be able to fit 50x50 = 2500 x 8 = 20Kbytes into L1, let alone L2. Even
x50 you're still only 1 MB, and lots of systems can fit that into cache.
And of course linear algebra operations have LOTS of optimizations --
many CPU/compilers should be able to pipeline and perform more than one
operation per cycle. So something is indeed very odd here, even for NR
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf