another radical concept...Re: [Beowulf] Cooling vs HW replacement

Bogdan Costescu Bogdan.Costescu at iwr.uni-heidelberg.de
Wed Jan 19 05:35:55 PST 2005


On Tue, 18 Jan 2005, Jim Lux wrote:

> If the temperature goes up, maybe you can slow down the computations
> (reducing the heat load per processor) or just turn off some
> processors (reducing the total heat load of the cluster).

I've had the second part (turning off nodes) working about 4 years
ago... Back when APM was a reliable way of turning off the power and
ACPI was not yet supported in the kernel. At that time also the
network drivers were not poluted with hooks for power management, so
using ether-wake was also easy to set up (of course, if the BIOS was
any good, but then I used to pick the mainboard carefully).

The reason for turning off the nodes was also overheating of the
computer room. While with those nodes we did not have so many problems
as with the dual-Athlons that followed shortly afterwards, I acted on
the same principle that was mentioned in this thread: it's better to
have things running at 5 degrees (Celsius) lower. At that time we did 
not have any scheduling system, so the "power management" could not be 
done very tightly. I have set up the nodes to shut down after 24 hours 
of being idle; I did not want to have too many down/up cycles as these 
are just as (or maybe ever more) disturbing to some components. 
Something obvious, but maybe worth mentioning: the nodes would log 
somewhere that they shutdown themselves for being idle for too long; I 
wanted to know when that happened and, even more, to be able to make a 
difference between nodes that simply crashed/were unplugged/etc. and 
those that did a graceful shutdown and should be able to wake up in 
good shape.

I did not have the chance to use this too much as we had a sudden
increase in computational requirements that lasted several months,
then the dual-Athlons came without APM and I wasn't able to reliably
control the shutdown and so the whole setup became useless...

Things are in better shape today, as IPMI has become more widespread
and it can reliably take care of both the shutdown and the wake-up.  
Too bad that it is still present only on $erver-grade mainboards and
even then is most often only an option.

You might have noticed that the original message said "just turn off
some processors" while I started with "turning off nodes". I would
indeed like to be able to shutdown individual CPUs from a SMP node,
but this was impossible several years ago; I don't know what the
status of hotswap CPU support in the kernel is now. <dream on> I only
hope that the hardware manufacturers will allow future multi-core CPUs
to have some cores in standby/low power modes and be able to wake up
without disturbing the running cores; and all these with a nice
control interface - doing it "automatically" by the CPU depending on
load is not so useful IMHO. </dream on>

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De





More information about the Beowulf mailing list