[Beowulf] The Walmart Compute Node?

jimlux at jpl.nasa.gov jimlux at jpl.nasa.gov
Fri Nov 9 10:40:04 PST 2007


Quoting "Jeffrey B. Layton" <laytonjb at charter.net>, on Sat 10 Nov 2007  
08:49:01 AM PST:

> andrew holway wrote:
>> Sod all this tin pot stuff.
>>
>> Buying all this crap, sticking it in a rack and stringing it together
>> with wire aint difficult. Making the damn software work is the tricky
>> bit.
>>
>> Get loads of ram, vmware-server and BINGO! you have a cluster!
>>
>
> But this isn't a cluster - it's enterprise masturbation. We're talking about
> HPC, not running payrolls on a server. It's all about performance.
> Running a bunch of VM instances on a server is not really HPC to me.
> Of course, it's a GREAT way to learn and I know a bunch of people
> who use for testing and development.

And, in fact, I contend that it's the grubby aspects of stringing  
wires, making netboot or sneakernet distribution work and so forth  
that is what future cluster builders desperately need practice with.

If your interest is parallel algorithm design, then multiple VMs is a  
great way.

If your interest is understanding the practicalities of cluster  
engineering, then a stack of 50 very cheap boxes might be a suitable  
playground for learning by ordeal.

Say you have a class in cluster engineering with, say, 20-30 students.  
You make up groups of 2-3 bodies (so they can learn social skills, if  
nothing else), and give each group a crate with 8 boxes with freshly  
wiped disks plus one head node and a box full of power cords, network  
cables, VGA cables, keyboards, all thrown in there by last semester's  
groups.  There will, of course, be 9 power cords in some crates and 7  
in others.

Have them build up a cluster and run some trivial demo.  There will be  
much learning, just getting a bootable image on all 8 machines (some  
might go the PXEboot route, some might sneakernet).

Then, tell them they have to gang all 80 machines into two clusters,  
each one with 40 machines and install a new OS.  Hand configuration  
management and sneakernet will be painful.  Then, have them swap 20 of  
the machines between clusters, do the same.  CM by hand and sneakernet  
is even MORE painful.  Heck, they can start to understand the  
differences between parallelism on machines and parallelism in bodies  
(ok, Bob, you put the boot CD in machines 1-5, Fred, you do 6-10, Ann,  
11-15, etc.).  If they're all running from one networked file server,  
they'll also learn empirically why you don't want them all to boot  
from the network simultaneously.

If you've given them 5 port cheap switches, they'll also get to learn  
about multi tier networking toplogies.


That'll larn 'em....
(If anyone decides to do this, let me know... I'd love to watch.. I'll  
even bring suitable frosty beverages for the spectators)


Jim...




More information about the Beowulf mailing list