Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Application Deployment

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at physics.mcmaster.ca
Sun Oct 10 08:43:11 PDT 2004


> RPMs or Debian.  With Red Hat and descendents (Fedora, Centos) you can
> use kickstart, which is a lovely tool for installing clusters.
> Kickstart run on top of PXE and DHCP makes installing most systems a
> matter of turning them on (after making a single host specific MAC

don't forget the zero-install approach - nothing installed on nodes.
just export the nodes' root filesystem from a fileserver, and you never
have to do anything per-node.  yum and rpm both let you install within
a separate tree, so the fileserver doesn't need to be running the same 
config as the nodes.

obviously, this results in a certain amount of NFS traffic, as opposed 
to having those files installed on the node's disk.  issues:

	- diskless nodes are very attractive in many contexts:
	reliability, price, maintainability, etc.

	- running NFS-root is a way of tolerating local disk faults;
	lack of swap may or may not be a problem.

	- NFS can easily be faster than local disk IO.

	- in aggregate, a buch of diskless nodes will, in the worst case,
	create much more traffic than your net and fileserver can handle.

	- my experience so far with 50-100-node clusters is that a 
	single NFS-connected fileserver is actually pretty good.
	(our nodes have a local disk used for things like checkpoints
	of big parallel applications.)

	- for big MPI clusters, it's extremely attractive to put
	fileservers directly onto the MPI fabric.  suddenly, gigabit
	is no longer a limiter for file IO and systems like Lustre
	can give some pretty impressive data rates.

	- this scheme is probably optimal for very hetrogenous 
	datacenters as well, where you might boot a node in some 
	random OS purely for a particular user/app.  that kind of thing
	seems very dubious to me, but it would only take a few minutes
	of perl scripting to write a web frontend to select things 
	like IP, distro, kernel, server, etc for a particular node,
	and propogate the changes.

I think that for a small cluster, I'd consider having the nodes
with full installs on them.  for anything larger than say 4 nodes, 
I definitely prefer the root-on-fileserver approach with "ephemeral" nodes.
it's also pretty sexy to take a node out of the box, plug it in and have it 
accept jobs in a minute or so with no manual intervention.

> course, require knowledge, experience, wisdom, and time to do right,
> which is why sysadmins get paid and are worth a very decent salary.

hmm.  anyone for a cluster-admin salary survey?

regards, mark hahn.




More information about the Beowulf mailing list