Installing Scyld: some questions

Wed Apr 11 07:44:01 PDT 2001

On Wed, 11 Apr 2001, Bruno Barberi Gnecco wrote:

>         I'm trying to install Scyld on a 6 machine cluster. All the 
> computers are dual P3-850, 1GB RAM, two 36GB SCSI HDs, Intel Optical
> Gigabit. I installed the server easily, and made boot disks using 
> Intel drivers; also created new phase 2 and 3 images. I don't know
> if it matters, but I'm installing only one slave while I try to
> figure out how to do it, but I already assigned the IP range for the
> other 4 in the master. However:
> 
> a) the install process seems to hang in phase 3. After some kernel 
> messages, nothing happens (the last messages recognize the 2 HDs). I 
> can bpsh and reboot it from the master, set it as up or unavailable,
> but not telnet or ftp. Is it normal?

Sounds like it booted fine to me.  The slave nodes never give a login
prompt or anything like that.  The only way to really access them is
through the BProc tools like bpsh.

You can't telnet or ftp to them because a) you shouldn't have to, and b)
there are no binaries on the slave nodes, so there's no way for them to
run the daemons, or the shells.

> 
> b) Will I have to use the boot disk for ever? I was hoping that the
> installation process would make it automatically, which is something
> else that tipped me that things are not allright. If the current
> behavior is correct, how can I do it? My idea is to create a small
> partition, but what to dump there?

(see the next answer)

> 
> c) beofdisk: where do I create the partitions?

The best thing to do is, run 'beofdisk -q' to get the current partition
information from the slave node harddrive.  Then 'beofdisk -d'  This
will look at the existing harddrive setup and then 'suggest' a
partitioning scheme that it thinks is appropriate.  This suggested
scheme includes something called a beoboot partition.

The quereied settings are written to files in /etc/beowulf/fdisk/  The
filename describes the harddrive geometry.  When it suggests a
partitioning scheme, it overwrites the queried settings with its
suggestion.  You can now tweak that setup by hand, or just leave it.
Next you run 'beofdisk -w' to write the partition setup described in the
files to the actual harddrives.

Next, you want to use beoboot-install.  If your harddrive is /dev/hda,
then you can do 'beoboot-install -a /dev/hda' to setup harddrive booting
for all the slave nodes, or 'beoboot-install 0 /dev/hda' to set it up
for just node 0, and that can be done for any node.  What
beoboot-install does is copy the kernel and initrd from the floppy drive
to the beoboot partition on the harddrive.  It then has installs lilo to
the MBR to that you can now boot your slave nodes off the harddrive.

> 
> d) I intend to run X on the slaves. Is it a problem?

Hrm.. is there a particular reason you want to run X on the slave nodes?
I'm not sure if its possible or not, but if it is, it will be a /LOT/ of
work to setup.

Running X on the slaves doesn't make a whole lot of sense to me.  With
the beowulf setup, you want to just log into the head node and not have
to worry about logging into the slave nodes.  This is even more
prevelent with the Scyld setup, where the slave nodes don't have any
binaries on them, instead you run processes on them by using BProc to
propagate the process from the head node to the slave nodes.
> 
> e) Is there a way to bypass the requirement of 2 net boards on the 
> master? I've been fooling it by using an onboard ethernet (that will 
> not be used) as eth0, and disabling it by ifconfig. But eth1 only allows
> access to the slave nodes. The reason to do it is lack of ports in our
> switches.

I really wouldn't advise that.  I'd like to point you to
http://bproc.sourceforge.net/bproc_3.html#SEC13, the Security section in
particular.  Scyld is based on BProc, and this section gives a very
brief overview of why you would never want to use BProc on anything
except its own private network.

Compared to the cost of the machines in your cluster, a dedicated switch
for your cluster is really inexpensive.  I highly recommend you get a
dedicated switch for your cluster.  This is for security as well as for
keeping things like broadcasts from slowing down your cluster.  The way
Scyld is designed to work is to have eth0 on the head node connect to
the outside world (or private company network, etc), and to have eth1
connect to a switch that has all of your slave nodes on it, and no other
machines.

Hope this helps,

Jag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 232 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20010411/c3bab848/attachment.sig>