Node cloning

Thu Apr 5 18:40:32 PDT 2001

List-Id: Discussion of topics related to Beowulf clusters <beowulf.beowulf.org>
X-BeenThere: beowulf at beowulf.org
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by blueraja.scyld.com id VAA05941

     Since cloning continues to be a fertile topic, I'll jump right in...
if you're not interested in node installation or cloning, skip this note.=
..

     I feel that the node installation problem using the NFS root/tftp/px=
e
boot approach has been solved by LUI (oss.software.ibm.com/lui) and other=
s.
I don't see why anyone would need to roll their own solution.  When one
defines a node to LUI, LUI creates a custom remote NFS root  and updates
dhcpd.conf or /etc/bootptab with an entry for the node.  One chooses a se=
t
of resources to install on the node, and creates a disk partition table.
Resources are lists of RPMs or tar files, custom kernels, individual file=
s
(/etc/hosts, /etc/resolv.conf, /etc/shadow would be good examples).  You
either pxe boot the node, or boot from diskette using etherboot technolog=
y.
The node boots, gets a custom boot kernel over the network via tftp, and
transfers control.  The kernel mounts the remote root, and reads the list
of allocated resources.  Based on the resources, the node partitions the
harddrive,creates FSs,  installs RPMS or tar files, copies any specified
files, installs a custom kernel, and so on.  The software configures the
eth0 device based on the IP info for that particular node, assigns a
default route and runs lilo to make the node ready to boot. If you allow
rsh, LUI will also remove the /etc/bootptab entry, and optionally reboot
the node.  It keeps a log of all activity during install.

     The goal of the LUI project is to install any distro on any
architecture (ia-32, itanium, PowerPC and alpha).  So far RedHat and ia-3=
2
are supported, but Suse and PowerPC are in test but not ready for prime
time.  It's an open source project, and open to contributors.  Since LUI =
is
resource based, and resources are reusable, it's perfect for heterogenous
clusters, clusters where nodes have different requirements.  Many people
have said that the NFS/tftp/pxe solution doesn't scale and should be
abandoned.  Well, users have installed 80-way clusters using LUI, and whi=
le
that's not huge, it's not dog meat either.

     Simple cloning, basically copying an image from one golden node to
another, changing some rudimentary info along the way, is performed today
by SystemImager, based on rsync technology. rysync is superior to simple
copy in that you can easily exclude files or directories (/var for exampl=
e)
and can be used for maintainence as well.  rsync does intelligent copying
for maintenance -- it copies only files that are different on the source
and target systems, and copies only the parts of the file that have
changed.  SystemImager and rsync are good solutions when the nodes in you=
r
cluster are basically the same, except for IP info and disk size.

     Then there's kickstart.  Well, it's ok if you do RedHat.

     I think the real burning issue is not how to install nodes, but
*whether* to install nodes or embrace the beowulf 2 technology from SCYLD.
I think SCYLD is close to becoming the linux beowulf appliance, a turnkey
commodity supercomputer.   It will be interesting to see how many new
clusters adopt traditional beowulf solutions, and how many adopt beowulf
2...

the view from here, Rich

Richard Ferri
IBM Linux Technology Center
rcferri at us.ibm.com
845.433.7920

"Robert G. Brown" <rgb at phy.duke.edu>@beowulf.org on 04/05/2001 06:47:46 P=
M

Sent by:  beowulf-admin at beowulf.org

To:   Giovanni Scalmani <Giovanni at lsdm.dichi.unina.it>
cc:   <beowulf at beowulf.org>
Subject:  Re: Node cloning

On Thu, 5 Apr 2001, Giovanni Scalmani wrote:

>
> Hi!
>
> On Thu, 5 Apr 2001, Oscar Roberto [iso-8859-1] L=F3pez Bonilla wrote:
>
> > And then use the command (this will take long, so you can do it
overnight)
> >          cp /dev/hda /dev/hdb ; cp /dev/hda /dev/hdc ; cp /dev/hda
/dev/hdd
>
>   I also did this way for my cluster, BUT I've experienced instability
> for some nodes (3/4 over 20). My guess was that "cp /dev/hda /dev/hdb"
> copied also the bad-blocks list of hda onto hdb and this looks wrong
> to me. So I partitioned and made the filesystems on each node and then
> cloned the content of each filesystem. Those nodes are now stable.
>
> A question to the 'cp gurus' out there: is my guess correct about
> the bad blocks list?

One of many possible problems, actually.  This approach to cloning
makes me shudder -- things like the devices in /dev generally have to
built, not copied, there are issues with the boot blocks and bad block
lists and the bad blocks themselves on both target and host.  raw
devices are dangerous things to use as if they were flatfiles.

Tarpipes (with tar configured the same way it would be for a
backup|restore but writing/reading stdout) are a much safer way to
proceed.  Or dump/restore pipes on systems that have it -- either one is
equivalent to making a backup and restoring it onto the target disk.
One reason I gave up cloning (after investing many months writing a
first generation cloning tool for nodes (which booted a diskless
configuration, formatted a local disk, and cloned itself onto the local
disk) and started a second generation GUI-driven one) was that just
cloning isn't enough.  There is all sorts of stuff that needs to be done
to the clones to give them a unique identity (even something as simple
as their own ssh keys), one needs to rerun lilo, it requires that you
keep one "pristine" host to use as the master to clone or you have the
very host configuration creep you set out to avoid.  Either way you end
up inevitably having to upgrade all the nodes or install security or
functionality updates.

These days there are just better ways (in my opinion) to proceed if your
goal is simple installation and easy upgrade/update and low maintenance.
Cloning is also very nearly an irreversible decision -- if you adopt
clone methods it can get increasingly difficult to maintain your cluster
without ALSO developing tools and methods that could just as easily have
been used to install and clean things up post-install.

Even so, if you are going to clone, I think that the diskless->local
clone is a very good way to proceed, because it facilitates
reinstallation and emergency operation of a node even if a hard disk
crashes (you can run it diskless while getting a replacement).  It does
require either a floppy drive ($15) or a PXE chip, but this is a pretty
trivial investment per node.

   rgb

--
Robert G. Brown                            http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www=
.beowulf.org/mailman/listinfo/beowulf