[Beowulf] [External] Re: Power Cycling Question
pbisbal at pppl.gov
Mon Jul 19 15:57:25 UTC 2021
On 7/17/21 9:20 AM, Gerald Henriksen wrote:
> On Fri, 16 Jul 2021 15:35:11 -0400, you wrote:
>> Reducing power use has become an important topic. One
>> of the questions I always wondered about is
>> why more cluster do not turn off unused nodes. Slurm
>> has hooks to turn nodes off when not in use and
>> turn them on when resources are needed.
> Given the expense of a cluster (purchase, running, space allocation,
> etc) perhaps that is the wrong question.
> If you have enough spare capacity that turning off nodes would create
> a noticable power saving, then maybe you should be looking at why you
> have that capacity?
> The goal really should be more about keeping all the nodes busy, so is
> there something else that can be done to make sure that the nodes have
> work to be done?
This is a valid argument, but there are times when cluster usage
fluctuates wildly. If you design your cluster to be full when the
cluster it's least used, then you have users angry that there job is
sitting in the queue for too long, or they just don't have enough
cores/RAM to do certain jobs. As with everything in HPC, "it depends".
In this case, on your user environment.
For example, I work in the physics world now, and there's always a mad
rush to complete or re-do calculations in the weeks prior to the annual
APS (American Physical Society) meeting, that almost all of my
physicists go to. Then the week during the conference, usage is much
lighter. Since I work in academia, the summer tends to be a bit lighter,
too, especially August.
During these swings, it might make sense for us to power down some of
the nodes, but during the rest of the year they are needed. Different
environments might not have as much fluctuation. For example, when I
worked in private industry, we didn't have massive percentages of the
staff going to the same conference, or going on vacation at the same time.
More information about the Beowulf