[Beowulf] Re: motherboards for diskless nodes
dld at cmb.usc.edu
Thu Feb 24 17:00:53 PST 2005
On Thu, Feb 24, 2005 at 06:20:21PM -0500, Jamie Rollins wrote:
> How netboot-capable are modern motherboards with on-board nics? I
> have experience with a couple that support PXE. However, I have
> been having a hard time finding information on-line stating expicitly that
> a given motherboard and/or bios supports netbooting. The only thing I've
> been able to find so far is the Tyan K8SR that uses the AMI BIOS 8.0. I
> get the impression that most MB's that have gigabit probably support PXE
> booting, but I was curious what other's impressions are.
You probably want to buy one in advance to test how reliable it is when
PXE booting. We have a 64-node cluster with local disks that have no CDROMs
or floppies, and we do maintenance and installs by net booting. It isn't
reliable. We have to reboot several times to get the things to hear the
PXE/DHCP replies and boot the pxelinux.0 image when attempting to reinstall
a node. The motherboard is the MSI K8D Master with 2x Broadcom tg3 gigE.
Many computers do netboot reliably in our environment (including laptops,
older P-III Tyan boards, new Xeon Supermicro/e1000 boards, etc). Not all
motherboard PXE/DHCP boot implementations are equal and up to the task for
completely diskless use. If you switch to a slightly newer motherboard on
deployment, all bets are off again (yes, made that mistake once, but had a
friendly supplier who let us exchange parts until it all worked).
If you deal with temporary files, want to suspend and swap out large
low-priority jobs, etc, you probably want a local disk on each node anyway.
Spending a couple gigs of that for a locally installed O/S isn't much of a
drama, especially on ~16 nodes. It makes updates more reliable, as in-use
libraries/binaries that are in use remain on local disk even when dpkg
replaces them, and only get deleted when no longer in use. NFS (being
stateless) doesn't have this behavior, so after an update you may
occaisionally have jobs/daemons when they try to page in a file that has
already been replaced.
If you don't have a central fileserver yet, you can also spread your
users' home directories among the disks on the nodes to avoid NFS contention
(though this means no RAID unless you buy two disks per node). If one user
launches some sort of cluster-wide NFS bomb, they only take out themselves
and the job running on the node with their home directory. Users do this -
they launch a stack of simultanous jobs that all load lots of data off the
filesystem, flattening whichever fileserver has their home directory.
Building a single high end fileserver that can survive the same load without
severely impacting all other users is expensive and tough.
> Something else that we're looking for that I believe is far more esoteric
> and has been equally hard to find information about is BIOS serial console
> redirect, ie. being able to control the bios from the serial port. I've
> been getting more and more into accessing machines through the serial
> port. The only thing holding me back from throwing out the video and
> keyboard entirely is being able to access and control the BIOS through the
> serial port as well.
We're using an Appro blade rack with devices that bring the critical
ports (serial, power, and reset) back to the "blade control center" (BCC).
It does more than just a serial console, and has a much smaller cable
footprint. To be honest though, I never use it. We power down
automatically using ACPI when the A/C fails (far too often), rather than
attempt to coordinate the BCC pulling the power plug after a node shuts
down. For console/BIOS, I prefer to just use a long monitor/keyboard cable
and plug it directly into the node with the problems. A 64-way KVM switch
with 64 cables would block airflow and access to the nodes, as well as being
expensive. The long cable also works for all the other non-BCC nodes in the
adjacent racks (long cable that can reach the whole row of racks). There
aren't really that many motherboards out there without onboard video, and
anything works for a text console.
Don't be afraid to be a Luddite if it works. :) You can spend the money
saved on serial cables and KVM switching hardware on noise cancelling
More information about the Beowulf