Fwd: [SSI] RFC: Etherboot/PXE to simplify installation and management

Rayson Ho raysonlogin at yahoo.com
Fri Nov 9 08:48:47 PST 2001


FYI,

Rayson

--- "Brian J. Watson" <Brian.J.Watson at compaq.com> wrote:
> In an SSI cluster, it should only be necessary to install software 
> on a single node. Most other nodes can be thin clients, using 
> Etherboot or PXE to load their kernel and ramdisk from the 
> CLMS master. A potential CLMS master node needs to have its kernel
> and ramdisk stored locally on a SCSI or IDE disk, in case it's
> the first node booted in the cluster. Even a potential CLMS master, 
> however, can initially get its kernel and ramdisk via Etherboot/PXE 
> and install them onto its hard disk with minimal sysadmin
> involvement.
> 
> Etherboot is an open-source software package for creating ROM images 
> that allow a computer to boot off the network using DHCP or BOOTP. 
> For those who cannot or will not flash their ROM with one of these
> images, Etherboot includes a special boot block for loading the image
> from a floppy or hard drive. Etherboot appears to support about
> a hundred different NIC models. Unfortunately, it only supports
> the x86 platform right now.
> 
> For more information, visit the Etherboot website:
>         http://etherboot.sourceforge.net/
> 
> PXE (Preboot Execution Environment) is an Intel specification for
> doing pretty much the same thing. An advantage is that PXE images
> come pre-loaded on certain NICs, but I suspect most PXE images are
> closed source.
> 
> To read Intel's PXE spec:
>         ftp://download.intel.com/ial/wfm/pxespec.pdf
> 
> To support this new dependent node booting model, changes to initial 
> node installation would include:
>   - Making sure dhcpd and tftpd are installed as part of the base 
>     Linux distribution.
>   - Installing mknbi (part of Etherboot) on the shared root for 
>     building a tagged image of the kernel and ramdisk.
>   - Adding an /etc/ssitab file for specifying the MAC address, 
>     IP address, node number, and local boot flag for each node
>     allowed to join the cluster. For each node with the local boot
>     flag set, a device for the boot partition must also be specified.
>     The local boot flag should only be set for potential CLMS master 
>     nodes on the x86 platform. For platforms not supported by 
>     Etherboot/PXE, such as Alpha, _all_ nodes should have the local 
>     boot flag set.
>   - Eliminating /etc/cluster.conf, which is obsoleted by /etc/ssitab.
>   - Installing a new mkdhcpd.ssi command that builds /etc/dhcpd.conf
>     from the data in /etc/ssitab. To support non-SSI uses of DHCP,
>     it copies anything it finds in /etc/dhcpd.proto before appending 
>     the generated lines.
>   - Installing a new lilo.ssi command that does the following:
>       * reads /etc/lilo.conf and /etc/ssitab, and uses onnode and
> lilo 
>         to sync the default kernel and ramdisk out to all potential 
>         nodes that are up with the local boot flag set
>       * runs mknbi to generate a tagged image of the default kernel 
>         and ramdisk in /tftpboot/, so that dependent nodes can 
>         download it while booting
> 
> In addition, changes will have to be made to the ramdisk, which means
> changes to the mkinitrd.ssi script:
>   - Copy /etc/ssitab into the ramdisk.
>   - Enhance /linuxrc to match a local MAC address to an entry in 
>     /etc/ssitab to determine the local IP address and node number.
>   - If the local boot flag is set, then /linuxrc compares the default
>     kernel and ramdisk on the shared root to those on the local disk.
> 
>     If they differ, it runs lilo.ssi with a special flag to just sync
>     the local disk.
>   - The hack in VI.3 of the installation instructions will go away. 
>     Dave Zafman and I cooked up a scheme for /linuxrc to read 
>     /proc/partitions and make all the devices it finds there.
>     That removes the need for the sysadmin to figure out the local 
>     device names of the two GFS partitions.
>   - As well as building the ramdisk, mkinitrd.ssi also runs 
>     mkdhcpd.ssi, since the sysadmin likely changed /etc/ssitab.
> 
> Adding new nodes -- this is the beautiful part:
>   - Make sure there are enough available journals for the new nodes 
>     on the GFS shared root. Note that the Cluster Filesystem (CFS) 
>     that Dave is porting doesn't have this requirement, which makes 
>     it better suited for large clusters.
>   - Edit /etc/ssitab to add records for each new node. The MAC 
>     address can be determined by booting the new node with an 
>     Etherboot floppy or ROM image. Although the DHCP server will 
>     not respond to this unknown MAC address just yet, the node will 
>     display on its console the MAC address of the card it discovered.
>   - Run mkinitrd.ssi to rebuild the SSI ramdisk and /etc/dhcpd.conf.
>   - Run lilo.ssi to distribute the new ramdisk to all nodes that are
>     up with the local boot flag set, and to rebuild the tagged image
>     in /tftpboot/.
>   - If a new node does not have the local boot flag set, just boot it
>     with the appropriate Etherboot/PXE ROM image or floppy. Like
> magic,
>     it'll join the cluster.
>   - If the local boot flag is set, and the platform is x86, boot it 
>     with the ROM image or floppy. While running /linuxrc, it'll sync 
>     the local disk if the boot partition has already been created.
>   - If the boot partition has not been created, /linuxrc will proceed
>     with joining the cluster. Once it has joined, run fdisk and mkfs
>     to set up the boot partition. Then reboot the node one more time 
>     with the ROM image or floppy, so it can sync the local disk the 
>     next time it joins.
>   - On a platform that does not support Etherboot/PXE, the PITA
> factor
>     is a bit higher for adding a new node (which must have the
>     local boot flag set). To avoid needless installation of the base 
>     OS, try booting off a distribution CD into rescue mode. Use fdisk
>     and mkfs to set up the boot partition. Mount it. Either use a 
>     floppy or set up networking to copy the default kernel and
> ramdisk 
>     from the cluster to the boot partition. Also, copy the
> appropriate 
>     stanza for your bootloader (e.g., aboot), and run it to install 
>     the boot block. Now it's ready to join the cluster. Finally, 
>     consider adding support for your platform to Etherboot or an
>     equivalent software package.
> 
> Some weaknesses in this proposal are support for non-x86 platforms,
> to which I've given some thought, and support for User Mode Linux,
> to which I've given very little thought. There are probably other
> weaknesses, but overall I think this improves the installation and 
> management of OpenSSI on the x86 platform.
> 
> Suggestions are definitely welcome, especially since I haven't 
> started the implementation, yet. ;)
> 
> -- 
> Brian Watson                | "Now I don't know, but I been told it's
> Linux Kernel Developer      |  hard to run with the weight of gold,
> Open SSI Clustering Project |  Other hand I heard it said, it's
> Compaq Computer Corp        |  just as hard with the weight of lead."
> Los Angeles, CA             |     -Robert Hunter, 1970
> 
> mailto:Brian.J.Watson at compaq.com
> http://opensource.compaq.com/
> 
> _______________________________________________
> ssic-linux-devel mailing list
> ssic-linux-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel


__________________________________________________
Do You Yahoo!?
Find a job, post your resume.
http://careers.yahoo.com



More information about the Beowulf mailing list