[Beowulf] Help req: Building a disked Beowulf

Mark Hahn hahn at physics.mcmaster.ca
Thu Aug 25 06:06:12 PDT 2005

> I am doing my research in Molecular Dynamics and we have very badly
> running Beowulf with 10 nodes in our lab. The position of the cluster

what is badly running about it?  personally, I would probably modify
your existing cluster incrementally.  first get parallel jobs running
(the problem is probably rsh/ssh configuration).  then start unifying 
the OS installed on slaves.

> 1 To get the cluster up and running parallel jobs.
> 2 The way I intend to do 1 is this. Install the OS (SuSE 9.3 Pro) on
> the master and install barebones ( I am not sure, but may be something
> like kernel, NFS and/or NIS, SSH, etc) on the rest of the nodes so
> that I am able to run parallel jobs as well as serial jobs on the
> nodes. Will require help on this.

I strongly prefer compute nodes to be entirely net-booted
(to have nothing installed on them).  the real value of this is that 
it means they're completely ephemeral - not only are they all instantly
updated when you change their (shared) root filesystem, but if a disk
dies, no one cares.  it's even handy that you can plug random other 
machines into the cluster and not effect anything that might be installed
on that machine (unless you want local swap or something.)

> 3 Whatever software I install on the master should be available on the
> nodes too (I guess this is possible either with NIS or NFS). Here too
> some help!

my procedure is to install the master, then create a /cluster export.
rpm/yum/etc can quite happily install a _whole_linux_system_ onto that
subtree.  then you simply boot the nodes with that as their nfsroot.
(yes, you will probably need to adjust the initrd and customize some 
of the /etc/rc.d files for slave nodes.)

> 4 I should have no need to propagate my executable to all the nodes
> manually to run a parallel job. I guess it should be possible if 3 is
> possible.

it is most convenient if slaves all have /home and other relevant 
filesystems mounted.  I put packages (add-on compilers, libraries etc
including MPI) in /opt.

> 5 All the nodes should be able to store data on the drives attached to
> them Storage is very important.

the problem with using storage on slave nodes is that any IO activity 
will interfere with compute activity.  that may not be a big concern for you.
personally, with a small cluster like this, I'd probably just have each 
of the slave disks NFS exported and auto-mountable by the others.  it's 
less convenient to have something like /data7, but Lustre/PVFS seems like 
overkill here.

More information about the Beowulf mailing list