Random Number Generation. Was Re: Beowulf Questions
Robert G. Brown
rgb at phy.duke.edu
Sun Jan 5 14:50:11 PST 2003
On 4 Jan 2003, Randall Jouett wrote:
> Yep, and let's not forget that a semi-decent sized, hardware-encryption
> based cluster could be set aside to generate the initial seeds, and then
> the seeds could propagate over the entire network, further reducing the
> chance that an identical seed will rear its ugly head. Well, for a
> certain amount of time, anyway.
All of this very interesting discussion ignores one problem -- the
"intrinsic" parallel scaling of parallel uniform deviate generation.
On most hardware, generating "the next uniform deviate" costs (say) 3 to
5 or 6 operations. That is, a GHz-class CPU should be able to generate
tens to hundreds of millions of double precision uniform deviates per
second, depending on the algorithm used. On a 100 Mbps (12.5 MBps)
network, one obviously "loses" parallelizing this on even two hosts
unless one has a VERY long and complicated uniform deviate algorithm
(one that can time-dominate the serial fraction of the computation).
To put it another way, on an expensive Gbps network, packing whole
blocks of UD's into large message for efficient delivery, I suppose that
one could conceivably "win" parallelizing uniform deviate generation on
a couple or three hosts, PROVIDED that one had an application that could
suck up the incoming stream as fast as the network DMA delivered them
without blocking. In my Monte Carlo problem it doesn't -- I have to do
a teeny bit of trig and matrix arithmetic based on the weighted outcome
produced by every uniform deviate -- at least as much computation as
generating the UD itself locally. Thus, if I worked very hard, I MIGHT
get a parallel speed up of two by pre-generating the next UD on another
node and having it totally ready to go when I need it, but -- suprise!
-- that's exactly the speedup I get from parallelizing the entire
embarrassingly parallel application on two nodes, each making its own
UD's with its own seed. Besides, real-world inefficiencies would
probably make this a lose-lose effort anyway -- MPI or PVM or even raw
sockets have some overhead.
My own problem isn't trivial, but neither is it THAT sensitive to
uniform deviate generator choice, provided that I avoid ones with
obvious and known problems. So far, I've found it easiest to simply run
a large number of independent simulations, each with its own RNG/seed
and a decently large shuffle table. I accumulate hundreds, even
thousands of these simulations, but this still leaves the probability of
a coincident run very, very small, much smaller than the statistical
error remaining. At that, all the damage one or two coincident seeds
will do is cause the unbiased statistical error estimate to be very
slightly underestimated (as "one" independent runs will be counted as
"two"). Totally negligible, as long as one doesn't make a habit of it
(and in any event, by recording the initial seeds and LOOKING, easily
eliminated).
A lot of other problems can be solved the same way. Probably MOST other
problems -- as noted above, local generation is likely to compete
favorably with remote generation unless and until uniform deviate
generation itself DOMINATES the serial fraction of the computation.
This is almost never the case -- decryption, monte carlo, almost
anything nontrivial requires some computation to be done WITH the
uniform deviate, and as soon as that computation time competes with
generation time, you are almost certain to lose. Note that (for
purists) I'm including all lattice-type decompositions of a problem
where UD's are generated AND USED on a node in the same category as EP
distribution of simulations -- this is just contrasting this sort of
parallelism with the proposed notion of a "UD-generating cluster".
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list