[Beowulf] Power Cycling Question

Douglas Eadline deadline at eadline.org
Fri Jul 16 19:35:11 UTC 2021

Hi everyone:

Reducing power use has become an important topic. One
of the questions I always wondered about is
why more cluster do not turn off unused nodes. Slurm
has hooks to turn nodes off when not in use and
turn them on when resources are needed.

My understanding is that power cycling creates
temperature cycling, that then leads to premature node
failure. Makes sense and has anyone ever studied/tested
this ?

The only other reason I can think of is that the delay
in server boot time makes job starts slow or power
surge issues.

I'm curious about other ideas or experiences.




