Installing Linux (without CD/floppies)

Tue Feb 18 07:18:38 PST 2003

On 18 Feb 2003, Ashley Pittman wrote:

> On Tue, 2003-02-18 at 01:55, Mark Hahn wrote:
> > > >What's the best way to install Linux RedHat in a 40-node
> > > >Beowulf whose nodes don't have neither floppy nor
> > 
> > why *install* at all?  so far at least, I'm booting my new cluster
> > entirely diskless.  each node actually has a disk, but nothing is 
> > installed on it - everything is PXE and NFS.  obviously, this is 
> > more network-intensive, but is it really an issue if your jobs last
> > for more than a couple minutes?
> 
> I've often wondered that, at the very least if you are going to install
> then you should just download a tgz of the root partition rather than
> the hassle of automagically installing/configuring packages.

With kickstart it is actually easier to do the latter.  A single
kickstart script can install all the nodes, typically, and have a
boilerplate DHCP entry.  It reduces an install (once set up) to:

  a) boot via pxe to a set of bootable kernel images with timeout to the
primary hard disk image (e.g. grub), if there is one.
  b) select the network/kickstart install kernel, set up to
automatically default straight into dhcp/kickstart.
  c) in four to five minutes the system reboots itself (in the %post
step) into operational mode, ready to run.

At ANY TIME you can reinstall this way, or you can use grub to reboot
directly into the install kernel AS the default and reboot into a
reinstall from anywhere, anytime, over the network.  You never "fix" a
node's installation if it is corrupted or being upgraded, just reinstall
it.

Package based installs are also trivial to update or to add new packages
to, and timely updates are ESSENTIAL to cluster (or LAN) management.
See for example, the "yum" tool, which pretty much totally automates
package management including the installation of new packages and their
fully automatic updating from a designated install server.  Anything
from a new kernel to a new package to a security update in an existing
package automagically propagates to the nodes once placed in the install
server tree.  With a tgz-based installation, you might as well just
rebuild the tgz image and do a full reinstall.  This is enough work that
(of course) it will get put off, which can be disasterous with security
updates.

> I guess as long as it works it's worth taking the path of least
> resistance for installing nodes, it's not like you do it every week
> anyway so if it takes an extra twenty minutes who cares?

It is a question of scaling.  An extra twenty minutes or human time per
node is generally unacceptable, because if one has 128 nodes to manage
that is over an FTE-week of additional work and folks care very much
indeed!  An extra twenty minutes of system time per node is "bad", as it
costs you a work-week of node productivity.  An extra twenty minutes (or
even eight hours) setting UP to do an install isn't such a bad thing, as
that time is DIVIDED by the number of nodes and not multiplied.  It is a
very GOOD thing if the eight hours invested in setting up a
kickstart-based install reduces the time spent per node in future
installs to sixty seconds of systems person time plus five minutes for
the actual install (running in parallel on many nodes at once) so that
all 128 nodes can be reinstalled in two or three hours of work instead
of forty.

A similar argument holds for diskless installs, BTW.  One may put "a lot
of energy" into maintaining the diskless image, but that energy is
divided between all the diskless nodes and can signficantly reduce the
overall time required to update or manage a cluster.  I honestly think
that kickstart/package based installs are competitive in efficiency, and
local installs arguably scale slightly better in certain aspects of
performance (lower overall network load, smaller memory footprint,
avoidance of access bottlenecks when all the nodes try to load the same
library at the same time).  However, diskless nodes are cheaper, and
disks are one of the most common sources of hardware failure and hence
cost a lot of management time on top of their up front cost.

> I really think diskless is underrated though, particuarly for small
> clusters.
> 
> > > As long as your network cards' support PXE (most do nowadays), then it is
> > 
> > is that really true?  great if so!  I haven't looked, but tend to only
> > expect builtin eth to support PXE...
> > 
> > > The only potentially laborious step is collecting the MAC address first and
> > > tweaking each BIOS to enable PXE booting.
> > 
> > dhcpd can allocate IPs from a pool; if you don't need stable IPs,
> > ou don't need to collect the MAC...
> 
> You can of course have a pool of ip's for unknown mac's and then when a
> node first boots it can work out it's network position and create a
> static entry in the dhcp config file for itself.

And in any event, it is only one minute of admin time per node even if
done by hand, done once per node/NIC.  Not QUITE trivial, but not even
worth automating for small clusters.

   rgb

> 
> Ashley,
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu