[Beowulf] Building my own highend cluster

Mark Hahn hahn at physics.mcmaster.ca
Wed Jun 28 16:32:51 PDT 2006


> Have a FAQ which describes how to boot nodes diskless?

PXE basically means "DHCP then TFTP".  what you get via TFTP is 
much like booting from disk: first you get a bootsector (pxelinux),
which then fetches a config file (again, via TFTP), which tells it
how to boot.  that may be to boot from the HD, but usually means
to fetch yet another file or two via TFTP (kernel and initrd).
IIRC, the pxelinux jumps to the kernel, which uses the initrd 
as its initial disk image.  you can stay in the initrd, or you can 
a more full-fleged root, such as over NFS.  or even boot directly
into an NFS root with no initrd.  there are somewhat tricky bits 
in figuring out which subtrees can stay in a tmpfs/ramdisk, 
and which can be NFS mounted, and how you assemble the whole into 
the sort of tree that normal tools are looking for.  (you don't 
need a per-node NFS export, necessarily.)

but there's no clear right or wrong.  there are also a lot of modern
linux/filesystem developments that might be useful.  I find tmpfs a lot
nicer than ramdisks, for instance.  and I like read-only NFS mounts,
with attribute cache times tweaked up a bit.  you have to ignore most 
of the "conventional wisdom" about how terrible NFS is in a cluster - 
it's not great, especially for large systems (hundreds of nodes).  but 
for smallish systems, it's fine (I have a 100-node NFS-root cluster which
works great).  I'd advocate scaling by replicating NFS servers (why not
one per rack?), rather than immediately jumping to disk-ful configurations.

> That's too expensive however and eating unnecessary power.

disks do not dissipate much, especially compared to other components.

> 14 disks is $$$ but more importantly also eat effectively nearly 30 watt a 
> disk from the power

current *ata disks peak at under 15W, and that's if you're 100% seek/write.
in a typical cluster I'd expect to average under 10W.

> (maxtors are like 22 watt and that's *after* the psu lost a lot of 
> power!!!!).

nah.  or are you talking about big 15K rpm SCSI disks?  besides, decent
PSU's are 70-80% efficient.  it's easy to blow (hah) over 100W on fans
in a 1U server.  that's a lot more significant...




More information about the Beowulf mailing list