I think the change in scale over the past 10-15 years is interesting, and especially the changes in architecture that result from this.

Going from 8-16 processors to 1000s is a big change.  Bisection bandwidth on your comm fabric.  How do you boot.. 8 processors can be booted sequentially or simultaneously from a server.  For 1000 you need a "better way".  How do you feed files to/from a 1000 processor cluster?

Issues with checkpoint/restart/reliability.  We had a project here at JPL looking at replacing the big 70 meter dishes with an array of, say, 100 6-12 meter dishes.  Replacing the single custom box with lots of a commodity things (6-12 meter antennas are stamped out by the hundreds).  Very Beowulf'y in concept.

Turns out that cryocolers (needed to keep the receiver at a nice toasty 4 Kelvins)  aren't really a mass produced item, and at the observed failure rates, you'd have a hard time keeping enough of them working to do what you needed.  A failure rate of once a month (or something.. I don't know what the actual rates are) on the 70m antenna means you can have a spare and swap it in, and then you basically have a month to fix the broken one.  With 100 antennas and a cryocooler MTBF of 0.5 years, you'll have 4 broken coolers at any given time

The practical differences in experience between assembling a toy cluster of 4-8 processors and simulating it with VM instances on a single machine.  What you learn from the former that you don't get on the latter (the importance of labeling of cables, for instance).

Hello all,

I am giving a talk on beowulf clustering to a local lug and was wondering if you had some interesting themes that I could talk about.

ta for now.

