[Beowulf] Home Beowulf Intial Startup Question

Mark Hahn hahn at physics.mcmaster.ca
Sun Feb 13 12:41:51 PST 2005

> netowrk technologies i am building a 20 node dell optiplex 1.9 ghz 256 ram 

kinda low on ram there, but for a learning cluster, that's plenty.
(actually 20 is kinda big for such a cluster...)

> blah blah nodes wondering first off if master control is recommened to be 
> same or better than nodes and what is recommened Linux O/s redhat or 
> mandrake etc... or anyones recommendations

distros don't matter - none of them are significantly different,
and they all work.  people who care about distros are more interested
in desktop decor than getting work done ;)

admittedly, I am not a never-reinvent-the-wheel person.  
NRTW is worse than NIH, IMO.  (some wheels desperately need reinvention,
all progress comes from reinvention, etc).

>     Im also looking for some links or resources for tools aka software like 
> parallel kernel upgrades moniter tools anything  for setting up Linux 
> beowulf to make  this go smoothly

to me, "smooth" means "no extra load per node".  I strongly prefer
net-booting, or at least net-root setups.  people will tell you that 
using NFS for this is horribly inefficient, dangerous and causes warts.
but it works extremely well, at least for clusters of <= 96 nodes,
based on my experience so far.  things might be different if you're 
doing retrocomputing based on a half-duplex 10mbps network or have large
IO loads.  

the benefit is that your cluster acts like you have just one slave node.
the cost is that you have to do a pretty minor amount of work to hack 
something like Fedora to boot diskless (small changes to the initrd.)
and of course, it does mean that "incidental" file IO will cause network
traffic.  it's not clear to me that this is a problem, though, since:

	- nodes are normally configured to be fairly minimal - 
	you don't have 30 user logins on each one, with people running
	ls/bash/netscape/gcc all the time.

	- NFS is not that bad at caching, and you can help this out by
	upping the per-mount cache parameters a bit.`

	- it's awefully nice to have a nearly fully functional node
	even after its disk dies.

	- my "diskless" nodes actually do have local swap and /tmp.
	disks are cheap and handy, just don't *depend* on them.

	- you can easily imagine a hybrid system that boots somehow
	(PXE or from disk), and does an rsync or rpm/yum/systemimager
	equivalent.  I don't really see the point though.

	- having your root FS exported read-only is also kind of nice:
	good security is layered security...

More information about the Beowulf mailing list