[Beowulf] [EXTERNAL] Power Cycling Question
Prentice Bisbal
pbisbal at pppl.gov
Mon Jul 19 15:46:33 UTC 2021
> On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can take surprisingly long, so there's another tradeoff there.
Not really, especially not with NVMe disk drives. I have NVMe drives in
both my laptop and my desktop, and it startling how fast they boot and
resume from suspend with NVMe disks.
I think the bigger issue with this approach is if enterprise servers
would support this. I believe there has to be some level of hardware
support for this, which I doubt servers designed for constant-on use
have. Someone please jump in and correct me if I'm wrong here.
Prentice
On 7/16/21 8:38 PM, Lux, Jim (US 7140) via Beowulf wrote:
> An interesting question.
> The power cycling reliability thing is probably not a big deal - the temperatures change a lot between light load and heavy load already, and if a "server class" PC can't take a power cycle per day, when the grungiest consumer unit can do it, I'd be surprised. It's not like you're cycling between -40C and 70C every hour like in an automotive application.
>
> Managing the chillers, though - That might be a bigger problem.
>
> And as Jörg points out, there's a fair amount of sophistication needed in setting your turn on and turn off thresholds.
>
> On the "spool RAM to disk" idea - That's sort of like checkpointing, and it can take surprisingly long, so there's another tradeoff there.
>
>
> On 7/16/21, 12:35 PM, "Beowulf on behalf of Douglas Eadline" <beowulf-bounces at beowulf.org on behalf of deadline at eadline.org> wrote:
>
>
> Hi everyone:
>
> Reducing power use has become an important topic. One
> of the questions I always wondered about is
> why more cluster do not turn off unused nodes. Slurm
> has hooks to turn nodes off when not in use and
> turn them on when resources are needed.
>
> My understanding is that power cycling creates
> temperature cycling, that then leads to premature node
> failure. Makes sense and has anyone ever studied/tested
> this ?
>
> The only other reason I can think of is that the delay
> in server boot time makes job starts slow or power
> surge issues.
>
> I'm curious about other ideas or experiences.
>
> Thanks
>
> --
> Doug
>
>
>
>
> --
> Doug
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://urldefense.us/v3/__https://beowulf.org/cgi-bin/mailman/listinfo/beowulf__;!!PvBDto6Hs4WbVuu7!ef5Z3NxzUcVChBwMKSYQ9u5d4nI_weKdbvUWM6BY8x2UyBeye1j64LNSRzJZUkml3wOJ0TM$
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
More information about the Beowulf
mailing list