diskless nodes? (was Re: Xbox clusters?)

Troy Baer troy at osc.edu
Thu Jan 10 12:10:54 PST 2002


On Thu, 10 Jan 2002, Eray Ozkural (exa) wrote:
> On Friday 07 December 2001 23:08, Troy Baer wrote:
> > That's true, *if* you're buying $300 nodes.  I'm not, though; our node
> > cost tends to be around $2500-3000, because we tend to buy server-class
> > SMP mobo's, lots of memory, Myrinet, rackmount cases, and a bunch of other
> > stuff to keep me from having to walk/drive over to the machine room (in a
> > secured building about 1.5 miles away) every time I need to reboot nodeXX.
> 
> I wonder what kind of hardware you use for being able to do that. It would be 
> very convenient for me as the system I use is 15 miles from my home.
> 
> In the setting I use, there is no video/keyboard/mouse for any nodes. I use a 
> serial cable in need of hard debugging. Everything else we do on eth. There 
> is only one thing I can't do: reboot or shutdown a node from the net.
> 
> Could you please write a list of the extra gear you have in your system for 
> remote administration?

Each of our cluster systems has a console server with some number of
Cyclades multiport serial cards.  The compute nodes are all configured to
send their consoles to a serial port.  On some of our older nodes with
mobos that support IPMI, we have a second serial port wired to the console
server for remote BIOS configuration and power control.  For the rest we
have networked power controllers.  We also have some locally developed
scripts to abstract away the differences, so an admin can just run a command
like "power off node05" and not worry whether it has IPMI or is on a
power controller.  BTW, we've got two different types of power controllers,
the widely available APC 8-port 15A controllers and another brand whose
name I don't recall.  The coworker of mine who developed our power control
scripts was of the opinion that the APCs are much easier to program for.

We do have a crash cart at each of our machine rooms with a VGA monitor
and keyboard for catastrophic cases where direct attention is required,
but that seems to be used mainly when we're initially configuring a system. 

	--Troy
-- 
Troy Baer                       email:  troy at osc.edu
Science & Technology Support    phone:  614-292-9701
Ohio Supercomputer Center       web:  http://oscinfo.osc.edu




More information about the Beowulf mailing list