[Beowulf] Good demo applications for small, slow cluster

Max R. Dechantsreiter max at performancejones.com
Wed Aug 21 12:22:26 PDT 2013


Hi Peter,

> That's interesting, where can I read about "giant-stepping the generator"?
> The wiki article
> http://en.wikipedia.org/wiki/Linear_congruential_generator doesn't
> mention distributed processing.

The term "giant-stepping" may not be in general use....
The idea is to find an efficient way to compute an RNG state
far ahead of the current one.

For a multiplicative generator like that of the NPB:

 	x[k+1] = a * x[k] mod p

(p not necessarily prime), observe that

 	x[k+2] = a * x[k+1] mod p = a**2 * x[k] mod p

so

 	x[k+m] = (a**m mod p) * x[k] mod p

a**m mod p may be efficiently computed by Russian Peasant:

 	http://en.wikipedia.org/wiki/Ancient_Egyptian_multiplication#Russian_peasant_multiplication

Then

 	x[k+2*m] = (a**m mod p) * x[k+m] mod p

et cetera.  Single-threaded operations fill in the rest.

For LFSR-based generators, the multiplier is a bit matrix,
so powers of that matrix would need to be computed (also by
Russian Peasant).

The scheme would be particular to each generation algorithm.

Surprisingly many RNG can be parallelized this way; the big
advantage over (say) simply giving each computational thread
private RNG parameters is that with care the results of a
parallel computation could match those of a serial one.

Regards,

Max



More information about the Beowulf mailing list