[Beowulf] SATA II - PXE+NFS - diskless compute nodes

Buccaneer for Hire. buccaneer at rocketmail.com
Sat Dec 9 12:03:27 PST 2006

Thank you for writing...

> With 2000+ nodes you should definitely look at remote power control, and 
> remote serial console access.

Have it already in place with remote monitoring as well.

> Also you might think of separate install servers for each (say) 500 
> machines. Mirror them up to each other of course.

We currently have 5 kickstart servers (one web server), the kickstart file is dynamically altered
to reflect the assigned server.

> Its unlikely that you would ever reboot 2000 machines at once, but think 
> ahead to (say) quick power on following a power cut.

We have had to do that during a couple of Hurricanes last year and power outages.  We actually have 
complete startup and shutdown procedures that are well tested now.

> I would hazard that any DHCP/PXE type install server would struggle with 
> 2000 requests (yes- you arrange the power switching and/or reboots to 
> stagger at N second intervals).

There are a few modifications you have to make to increase the number of bootps before
it fails.

So now to figure out my next step.  I will need local space for logs and data/temp data files.

