[Beowulf] Good demo applications for small, slow cluster

Max R. Dechantsreiter max at performancejones.com
Wed Aug 21 12:46:15 PDT 2013


Hi Peter,

Not hardly:

 	R. R. Coveyou (Knuth "Seminumerical Algorithms" 1981, 26-27):

         	u[0] % 4 == 2

         	u[k+1] = u[k]*(u[k] + 1) % (1<<e)

Coveyou's generator is NOT parallelizable.

(It is very useful, however, for generating large seeds
from a small one, for use in another generator, thanks
to quadratic growth.)

Regards,

Max
---



On Wed, 21 Aug 2013, Peter St. John wrote:

> Max,
> Remarkable, thanks! I surely agree that in binary, doubling is fast. So you
> sort of bypass computing low powers, with an ancient method (?!) of
> computing high powers efficiently. Very cool. So, everything is
> parallelizable :-)
> Peter
>
>
> On Wed, Aug 21, 2013 at 3:22 PM, Max R. Dechantsreiter <
> max at performancejones.com> wrote:
>
>> Hi Peter,
>>
>>
>>  That's interesting, where can I read about "giant-stepping the generator"?
>>> The wiki article
>>> http://en.wikipedia.org/wiki/**Linear_congruential_generator<http://en.wikipedia.org/wiki/Linear_congruential_generator>doesn't
>>> mention distributed processing.
>>>
>>
>> The term "giant-stepping" may not be in general use....
>> The idea is to find an efficient way to compute an RNG state
>> far ahead of the current one.
>>
>> For a multiplicative generator like that of the NPB:
>>
>>         x[k+1] = a * x[k] mod p
>>
>> (p not necessarily prime), observe that
>>
>>         x[k+2] = a * x[k+1] mod p = a**2 * x[k] mod p
>>
>> so
>>
>>         x[k+m] = (a**m mod p) * x[k] mod p
>>
>> a**m mod p may be efficiently computed by Russian Peasant:
>>
>>         http://en.wikipedia.org/wiki/**Ancient_Egyptian_**
>> multiplication#Russian_**peasant_multiplication<http://en.wikipedia.org/wiki/Ancient_Egyptian_multiplication#Russian_peasant_multiplication>
>>
>> Then
>>
>>         x[k+2*m] = (a**m mod p) * x[k+m] mod p
>>
>> et cetera.  Single-threaded operations fill in the rest.
>>
>> For LFSR-based generators, the multiplier is a bit matrix,
>> so powers of that matrix would need to be computed (also by
>> Russian Peasant).
>>
>> The scheme would be particular to each generation algorithm.
>>
>> Surprisingly many RNG can be parallelized this way; the big
>> advantage over (say) simply giving each computational thread
>> private RNG parameters is that with care the results of a
>> parallel computation could match those of a serial one.
>>
>> Regards,
>>
>> Max
>>
>



More information about the Beowulf mailing list