[Beowulf] /dev/random entropy on stateless/headless nodes

Sun Feb 27 08:48:28 PST 2011

On Sat, 26 Feb 2011, Peter St. John wrote:

I've been following this conversation halfheartedly because of midterm
grades due last Friday, but I suppose I should chime in.

   /dev/random is, as noted, an entropy generator.  Usually it has enough
entropy after a boot and some network activity to start generating
random numbers.  However, it is always >>enormously slow<< and can only
generate a very few random numbers before it has to wait for more
entropy to be generated.  To run dieharder tests on it I had to do
things like stroke the mouse pad as though it was a kitty's neck for an
hour or so.  You can actually get it to where you can watch it making
new random numbers AS you do bursty things that it uses as part of the
entropy pool.  It is not, I repeat not, for production runs using very
large numbers of rands unless you want to die of old age before an e.g.
Monte Carlo run finishes.

   /dev/urandom is a mix of a software generator and /dev/random.
Entropy based rands are used to "diffuse" entropy into a software
generator, keeping it from being predictable (and giving it an infinite
period, as it were).  However, you then do have to worry about the
quality of the random numbers, and it is still slow.  dieharder doesn't
reveal in over flaws with the stream, but slow is in and of itself a
good reason not to use it for much.

The solution for nearly anyone needing large numbers of fast, high
quality random numbers is going to be:  Use a fast, high quality random
number generator from e.g. the Gnu Scientific Library, and >>seed<< it
from /dev/random, ensuring uniqueness of the seed(s) across the cluster
by any one of several means (storing used seeds in a shared table and
avoiding them in case you have a "birthdays" collision out of the four
billion 32 bit seeds /dev/random can produce).  The table is small, the
lookup occurs outside of the main loop as there is no good reason to
reseed a good rng in the middle of a computation (and for many
generators, there is at least one good reason not to).

So, what are the good generators?  Mersenne Twisters are good
(mt19937_1999, for example). Good and fast.  gfsr4 is pretty good,
pretty fast.  taus/taus2 are both good and fast.  ranlxd2 is good, but
not as fast.  KISS (not yet in the GSL, but I have a GSL-compatible
implementation if anybody needs it in the current dieharder that
eventually I'll send in to Brian for the GSL) is very good and very fast
(fastest).  Any of these, seeded from /dev/random (and yes, dieharder
has code to do that too) will work just fine, if just fine is "the best
you can do given the state of the art".

You too can test generators for yourself (and learn a lot about random
number generators and null hypotheses and testing) with dieharder,
available here:

  http://www.phy.duke.edu/~rgb/General/dieharder.php

I do have a slightly more current version with kiss, "superkiss" (a
mixture of kiss generators) and a straight up supergenerator that lets
you use a mixture of GSL-based generators, each separately seeded, in a
frame that returns numbers out of a shuffle table.  If the generators it
uses are fast, it is still acceptably fast, and if nothing else it hides
and obscures any sort of flaws any single generator might have.

     rgb

> After scanning a bunch of man pages (incidentally, "man urandom" explains
> the difference between random and urandom; "man random" does not) and
> experimenting to produce my reply (above), I found this when google pointed
> into wiki (sheesh):
> "It is also possible to write to /dev/random. This allows any user to mix
> random data into the pool. Non-random data is harmless, because only a
> privileged user can issue the ioctl needed to increase the entropy estimate.
> The current amount of entropy and the size of the Linux kernel entropy pool
> are available in /proc/sys/kernel/random/." 
> ( http://en.wikipedia.org/wiki//dev/random )
> 
> So, yes re Mark's:
> 
> ".. or even just have a boot script that stirs in some unique per-node data?
>  (say low-order rdtsc salted with the node's
> MAC.) ..."
> 
> So (from the wiki) piping dumb data into /dev/random is harmless since the
> entropy measure wouldn't be fooled, so then yes, just anyone can pipe some
> bytes in anytime. So yeah, the rtdsc, I just meant my 9 digits of
> nanoseconds as something easy to try at boot time, and shuffling that with
> the MAC is a good idea. (Since a light-nanosecond is about what 10 cm? the
> lengths of cables in the server room would be enough to give every node
> different boot times, in nanoseconds, right? or no, because your cables are
> all standard lengths, but coiled as needed?). So just sticking in a script
> that flushes such stuff into /dev/random at boot time (or anytime) should be
> practicable and easy.
> 
> Peter
> 
> 
> 
> On Sat, Feb 26, 2011 at 5:32 PM, Mark Hahn <hahn at mcmaster.ca> wrote:
>       > nodes using CentOS 5.  One of our users just ran onto a
>       problem with
>       > /dev/random blocking due to the lack of entropy.
> 
> /dev/random should be reserved for generating non-ephemeral keys.
> 
> > Do others have this problem?  What do you do?
> 
> I've only ever heard of it on servers that do a lot of ssl
> transactions,
> and were configured to use /dev/random for transient keys or nonces.
> 
> > Do you modify network drivers to introduce entropy?
> > Are there other suggested methods of adding entropy to /dev/random?
> 
> good questions.  I haven't been following the state of kernel entropy
> gathering - I guess I assumed that they'd worked out a basic set of
> reasonable sources and had a (eg) /proc interface for enabling others
> that would be site-specific (such as eth).
> 
> > Are there ways to introduce entropy from the random number generator
> > on some Intel systems?  Did Intel remove this from more recent
> chips?
> 
> appears so:
> http://software.intel.com/en-us/forums/showthread.php?t=66236
> 
> > How reliable is /dev/urandom without initial entropy?  We boot from
> 
> my understanding is urandom is a crypto hash of the entropy pool:
> if the entropy pool never changes or is perfectly guessable,
> urandom is only as good as the hash.  since the crypto hash is not
> broken in general, I'd consider it plenty good.
> 
> 
> > stateless disk images and don't carry any entropy over from previous
> > boots.
> 
> boot scripts normally save and restore entropy, so why can't they do
> so
> to/from some server of yours?  or even just have a boot script that
> stirs
> in some unique per-node data?  (say low-order rdtsc salted with the
> node's
> MAC.)
> 
> this is a good question - not a burning issue I think, but something
> to
> not forget about.  we're starting to use NFS-root login nodes, for
> instance,
> which could conceivably run into entropy issues.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> 
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu