Node cloning

Fri Apr 6 00:39:59 PDT 2001

On Thu, 5 Apr 2001, Robert G. Brown wrote:
[Copying /dev/hda to /dev/hd?]
> One of many possible problems, actually.  This approach to cloning
> makes me shudder -- things like the devices in /dev generally have to
> built, not copied, there are issues with the boot blocks and bad block
> lists and the bad blocks themselves on both target and host.  raw
> devices are dangerous things to use as if they were flatfiles.

Unfortunately I'm not an expert in disk technology, so I might be
wrong here... but I thought that the bad block lists were maintained
by the disks themselves and not transparent to the OS.

In any case: We did not have any instability issues due to cloning in
the last few years.

[...]
> One reason I gave up cloning (after investing many months writing a
> first generation cloning tool for nodes (which booted a diskless
> configuration, formatted a local disk, and cloned itself onto the local
> disk) and started a second generation GUI-driven one) was that just
> cloning isn't enough.  There is all sorts of stuff that needs to be done
> to the clones to give them a unique identity (even something as simple
> as their own ssh keys), one needs to rerun lilo, it requires that you
> keep one "pristine" host to use as the master to clone or you have the
> very host configuration creep you set out to avoid.  Either way you end
> up inevitably having to upgrade all the nodes or install security or
> functionality updates.

Let me just add a few insights from our years of experience here:
- We use DHCP to assign (fixed) IP addresses to nodes. The only
  problem here is to get the list of all MAC addresses in the first
  place.
- We use the same SSH hostkey for all nodes in our cluster (not for
  the server and our personal workstations though).
- When we clone whole disks or whole partitions, we don't need to run
  lilo, fdisk or whatever. The disks are identical after the clone,
  including partition tables and boot sectors.
- An additinal boot script called "personalize" personalizes the
  machines during the first boot-up. Based on the hostname the script
  mounts additional external disk drives, configures additional
  network interfaces etc.

To conclude: If we want to update our cluster, then we update a master
machine, boot all machines in a small maintenance Linux with PXE, run
Dolly on all machines to clone them, reboot, done. There are no
post-cloning operations required, but as usual, YMMV.

Of course there might be better ways to install your cluster,
depending on your needs, configuration, experience, etc. For
(mostly) homogenous mid-sized clusters (we have 16--24 nodes in our
clusters), cloning works well.

- Felix
-- 
Felix Rauch                      | Email: rauch at inf.ethz.ch
Institute for Computer Systems   | Homepage: http://www.cs.inf.ethz.ch/~rauch/
ETH Zentrum / RZ H18             | Phone: ++41 1 632 7489
CH - 8092 Zuerich / Switzerland  | Fax:   ++41 1 632 1307