[Beowulf] OT: public random numbers?

Wed Aug 24 18:58:52 PDT 2011

On Aug 12, 2011, at 8:35 PM, Robert G. Brown wrote:

> On Fri, 12 Aug 2011, David Mathog wrote:
>
>> Robert G. Brown wrote:
>>
>>>  Everybody must be able to obtain it
>>>> freely from a web connection.
>>>
>>>    http://www.random.org/
>>>
>>
>> Nice site.  They have something that is very close, the pregenerated
>> random files, from which a small set of digits may be extracted,  
>> and the
>> files themselves have MD5 checksums (but are not signed).
>> They also support https.  It comes up a little short on criteria 1  
>> (we
>> really don't know what is going on behind the scenes) and 6 (it is a
>> single site.)
>
> Behind the scenes is documented pretty well on the site, and the  
> guy who
> runs it is a human being, you can communicate with him to learn even
> more.  I already know him a bit, as he and I have collaborated on
> applying dieharder to test random.org datasets -- even "the"  
> random.org
> dataset as of some time ago (I have a few hundred MB of random number
> from the site in my dieharder directory).  IIRC, the numbers are
> generated continuously and fairly slowly by grabbing and filtering and
> transforming atmospheric noise.  As a source of entropy, that is
> probably excellent if (as noted) slow, but many good sources of  
> entropy
> seem to be fairly slow.  He has good reason to think that his numbers
> are theoretically "true random numbers"

Well there is another test i stumbled upon when i did do some  
analysis on
casino (which student who takes himself serious doesn't do an attempt to
write some simulations seeing whether you can win something in the  
casino
by designing some strategies?).

The simulation revealed it was rather easy to make a fortune with  
roulette
with the doubling system (first put in 1 then if you win, put in 1  
else double
and keep doublinguntil you win). Reports from guys (some of them missing
an eye, another one a hand) who actually study anything trying to make a
profit in casino's (and they also really try it in the casino's),  
revealed that
using the doubling system they never saw someone really make big profit
with it.

So there was a problem between the random generated data versus the
true random numbers generated in the casino.

Statistical analysis revealed the problem, though not so soon.

I noticed that most generated semi-random numbers with software  
generators,
had the habit to truely adress a search space of n always in O (n log  
n).

So if you draw from most software RNG's a number and do it modulo n,
with n being not too tiny, say quite some millions or even billions,  
then every
slot in your 'hashtable' will get hit at least once by the RNG,  
whereas data
in reality simply happens to not have that habit simply.

So true random numbers versus generated noise is in this manner easy
to distinguish by this. Now i didn't study literature whether some  
other chap
some long time ago already had invented this. That would be most  
interesting
to know.

In semi pseudo code, let's take an array of size a billion as an  
example,
though usually a few million is more than ok:

n = 2^30; // 2 to the power 30

Function TestNumbersForRandomness(RNG,n) {
   declare array hashtable[size n];

   guessednlogn = 2 * (log n / log 2) * n;

   for( i = 0 ; i < n ; i++ )
     hashtable[i] = FALSE;

   ndraws = filledn = 0;
   while( ndraws  < guessednlogn ) {
      randomnumber = RNG();
      r = randomnumber % n; //     randomnumber =  r  (mod n)
      if( hashtable[r] == FALSE ) {
         hashtable[r] = TRUE;
         filledn++;
         if( filledn >= n )
           break;

     }
     ndraws++;
   }

   if( filledn >= n )
      print "With high degree of certainty data generated by a RNG\n");
    else
      print "Not so sure it's a RNG\n";

}

Regards,
Vincent

> -- both unpredictable and
> flat/decorrelated at all orders, and even though there aren't really
> enough of them for my purposes, I've used them as one of the (small)
> "gold standard" sources for testing dieharder even as I test them.   
> For
> all practical purposes threefish or aes are truly random as well and
> they are a lot faster and easier to use as gold standard generators,
> though.
>
> I don't quite understand why the single site restriction is  
> important --
> this site has been up for years and I don't expect it to go away soon;
> it is quite reliable.  I don't think there is anything secret about  
> how
> the numbers are generated, and I'll certify that the numbers it  
> produces
> don't make dieharder unhappy.  So 1 is fixable with a bit of effort on
> your part; 6 I don't really understand but the guy who runs the  
> site is
> clearly willing to construct a custom feed for cash customers, if  
> there
> is enough value in whatever it is you are trying to do to pay for
> access.  If it's just a lottery, well, lord, I can think of a dozen  
> ways
> to make numbers so random that they'd be unimpeachable for any sort of
> lottery, both unpredictable and uncorrelated, and they don't any of  
> them
> require any significant amount of entropy to get started.
>
> I will add one warning -- "randomness" is a rather stringent
> mathematical criterion, and is generally tested against the null
> hypothesis.  Amateurs who want to make random number generators out of
> supposedly "random" data streams or fancy algorithms almost invariably
> fail, sometimes spectacularly so.  There are a half dozen or more
> really, really good pseudorandom number generators out there and it is
> easy to hotwire them together into an xor-based high entropy stream  
> that
> basically never repeats (feeding it a bit of real entropy now and then
> as it operates).  I would strongly counsel you against trying to take
> e.g. weather data and make something "random" out of it.  Unless you
> really know what you are doing, you will probably make something that
> isn't at all random and may not even be unpredictable.  Even most
> sources of "quantum" randomness (which is at least possibly "truly
> random", although I doubt it) aren't flat, so that they carry the
> signature of their generation process unless/until you manage to
> transform them into something flat (difficult unless you KNOW the
> distribution they are producing).  Pseudorandom number generators have
> the serious advantage of being amenable to at least some theoretical
> analysis (so you can "guarantee" flatness out to some high
> dimensionality, say) as well as empirical testing with e.g. dieharder.
>
> HTH,
>
>      rgb
>
>>
>> Thanks,
>>
>> David Mathog
>> mathog at caltech.edu
>> Manager, Sequence Analysis Facility, Biology Division, Caltech
>>
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf