[Beowulf] Green Cluster?

Lombard, David N dnlombar at ichips.intel.com
Wed Jul 23 14:43:55 PDT 2008

On Sat, Jul 19, 2008 at 01:40:59PM -0700, fkruggel at uci.edu wrote:
> Thanks for your suggestions. Let me be more specific.
> I would like to have nodes automatically wake up when
> needed and go to sleep when idle for some time. My
> ganglia logs tell me that there is considerable idle
> time on our cluster. The issue is that I would like to
> have the cluster adapt *automatically* to the load,
> without interaction of an administrator.

Sounds like a plan...

> Here is how far I got:
> I can set a node to sleep (suspend-to-ram) using ACPI.
> But for powering on, I have to press the power button.
> No automatic solution.
> Is it possible to wake up a node over lan (without reboot)?

It depends.  (Did you actually expect a different answer?)

Setting the wakeup events *may* help.  What does /proc/acpi/wakeup
show?  Here's an example from a D975PBZ running F7's 2.6.23:

 Device  S-state   Status   Sysfs node
 TANA      S4     disabled  pci:0000:02:01.0
 P0P3      S4     disabled  pci:0000:00:1e.0
 AC97      S4     disabled
 USB0      S3     disabled  pci:0000:00:1d.0
 USB1      S3     disabled  pci:0000:00:1d.1
 USB2      S3     disabled  pci:0000:00:1d.2
 USB3      S3     disabled  pci:0000:00:1d.3
 USB7      S3     disabled  pci:0000:00:1d.7
 UAR1      S4     disabled  pnp:00:07
 SLPB      S4    *enabled

Note, only SLPB (sleep button) is enabled by default on this system.
- the "TANA" device on *this* system is the NIC
- setting wol via ethtool doesn't affect the above.

And here's a old Dell Inspiron running kernel.org's

 # cat /proc/acpi/wakeup
 Device  S-state   Status   Sysfs node
 LID       S3    *enabled
 PBTN      S4    *enabled
 PCI0      S3     disabled  no-bus:pci0000:00
 UAR1      S3     disabled  pnp:00:0d
 MPCI      S3     disabled

Where both the lid (LID) and power (PBTN) buttons are enabled by default.
Also note the maximum ACPI sleep levels whence the wakeup will work.

If you need to enable a device, use

 # echo _device_ enable > /proc/acpi/wakeup

where _device_ is the name listed in /proc/acpi/wakeup

Here's the Dell responding to a lid close in a very very minimal system
(kernel, busybox, uClibc):

 # Stopping tasks ... done.
 Suspending console(s)

Opening the lid produces this after about 6 seconds:

 pnp: Device 00:0d disabled.
 ACPI: PCI Interrupt 0000:00:03.0[A] -> Link [LNKD] -> GSI 11 (level, low) -> IR1
 ACPI: PCI Interrupt 0000:00:03.1[A] -> Link [LNKD] -> GSI 11 (level, low) -> IR1
 pnp: Device 00:0d activated.
 Restarting tasks ... done.

> How can I detect that a node was idle for some specific time?

This all really needs to be run from the RM (resource manager).  The RM
can know when a job ends on a node and that a node will or will not be
free in the future.  The RM can also manage the scheduler to avoid bringing
sleeping nodes up until they're actually needed--a SMOP left as an exercise
to the reader ;)

I *think* Moab may do some of this stuff already.

David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.

More information about the Beowulf mailing list