[Beowulf] [upgrade strategy] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

John Hearns hearnsj at googlemail.com
Fri Jan 5 05:40:38 PST 2018


This seems very relevant

https://security.googleblog.com/2018/01/more-details-about-mitigations-for-cpu_4.html?m=1

On 4 Jan 2018 11:49 pm, "Jörg Saßmannshausen" <sassy-work at sassy.formativ.net>
wrote:

> Dear all,
>
> that was the question I was pondering about all day today and I tried to
> read
> and digest any information I could get.
>
> In the end, I contacted my friend at CERT and proposed the following:
> - upgrade the heanode/login node (name it how you like) as that one is
> exposed
> to the outside world via ssh
> - do not upgrade the compute nodes for now until we got more information
> about
> the impact of the patch(es).
>
> It would not be the first time a patch is opening up another can of worms.
> What
> I am hoping for is finding a middle way between security and performance.
> IF
> the patch(es) are save to apply, I still can roll them out to the compute
> nodes without loosing too much uptime. IF there is a problem regarding
> performance it only affects the headnode which I can ignore on that
> cluster.
>
> As always, your mileage will vary, specially as different clusters have
> different purposes.
>
> What I would like to know is: how about compensation? For me that is the
> same
> as the VW scandal last year. We, the users, have been deceived. Specially
> if
> the 30% performance loss which have been mooted are not special corner
> cases
> but are seen often in HPC. Some of the chemistry code I am supporting
> relies
> on disc I/O, others on InfiniBand and again other is running entirely in
> memory.
>
> These are my 2 cents. If somebody has a better idea, please let me know.
>
> All the best from a rainy and windy London
>
> Jörg
>
>
> Am Mittwoch, 3. Januar 2018, 13:56:50 GMT schrieb Remy Dernat:
> > Hi,
> > I renamed that thread because IMHO there is a another issue related to
> that
> > threat. Should we upgrade our system and lost a significant amount of
> > XFlops... ? What should be consider :   - the risk  - your user
> population
> > (size / type / average "knowledge" of hacking techs...)  - the isolation
> > level from the outside (internet)
> >
> > So here is me question : if this is not confidential, what will you do ?
> > I would not patch our little local cluster, contrary to all of our other
> > servers. Indeed, there is another "little" risk. If our strategy is to
> > always upgrade/patch, in this particular case you can loose many users
> that
> > will complain about perfs... So another question : what is your global
> > strategy about upgrades on your clusters ? Do you upgrade it as often as
> > you can ? One upgrade every X months (due to the downtime issue) ... ?
> >
> > Thanks,
> > Best regardsRémy.
> >
> > -------- Message d'origine --------De : John Hearns via Beowulf
> > <beowulf at beowulf.org> Date : 03/01/2018  09:48  (GMT+01:00) À : Beowulf
> > Mailing List <beowulf at beowulf.org> Objet : Re: [Beowulf] Intel CPU
> design
> > bug & security flaw - kernel fix imposes performance penalty Thanks
> Chris.
> > In the past there have been Intel CPU 'bugs' trumpeted, but generally
> these
> > are fixed with a microcode update. This looks different, as it is a
> > fundamental part of the chips architecture.However the Register article
> > says: "It allows normal user programs – to discern to some extent the
> > layout or contents of protected kernel memory areas" I guess the phrase
> "to
> > some extent" is the vital one here. Are there any security exploits which
> > use this information? I guess it is inevitable that one will be
> engineered
> > now that this is known about. The question I am really asking is should
> we
> > worry about this for real world systems. And I guess tha answer is that
> if
> > the kernel developers are worried enough then yes we should be too.
> > Comments please.
> >
> >
> >
> > On 3 January 2018 at 06:56, Greg Lindahl <lindahl at pbm.com> wrote:
> >
> > On Wed, Jan 03, 2018 at 02:46:07PM +1100, Christopher Samuel wrote:
> > > There appears to be no microcode fix possible and the kernel fix will
> > >
> > > incur a significant performance penalty, people are talking about in
> the
> > >
> > > range of 5%-30% depending on the generation of the CPU. :-(
> >
> > The performance hit (at least for the current patches) is related to
> >
> > system calls, which HPC programs using networking gear like OmniPath
> >
> > or Infiniband don't do much of.
> >
> >
> >
> > -- greg
> >
> >
> >
> >
> >
> > _______________________________________________
> >
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> >
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20180105/e8fdc984/attachment.html>


More information about the Beowulf mailing list