[Beowulf] [upgrade strategy] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

Jörg Saßmannshausen sassy-work at sassy.formativ.net
Thu Jan 4 15:48:20 PST 2018


Dear all,

that was the question I was pondering about all day today and I tried to read 
and digest any information I could get. 

In the end, I contacted my friend at CERT and proposed the following:
- upgrade the heanode/login node (name it how you like) as that one is exposed 
to the outside world via ssh
- do not upgrade the compute nodes for now until we got more information about 
the impact of the patch(es). 

It would not be the first time a patch is opening up another can of worms. What 
I am hoping for is finding a middle way between security and performance. IF 
the patch(es) are save to apply, I still can roll them out to the compute 
nodes without loosing too much uptime. IF there is a problem regarding 
performance it only affects the headnode which I can ignore on that cluster.

As always, your mileage will vary, specially as different clusters have 
different purposes.

What I would like to know is: how about compensation? For me that is the same 
as the VW scandal last year. We, the users, have been deceived. Specially if 
the 30% performance loss which have been mooted are not special corner cases 
but are seen often in HPC. Some of the chemistry code I am supporting relies 
on disc I/O, others on InfiniBand and again other is running entirely in 
memory. 

These are my 2 cents. If somebody has a better idea, please let me know.

All the best from a rainy and windy London

Jörg


Am Mittwoch, 3. Januar 2018, 13:56:50 GMT schrieb Remy Dernat:
> Hi,
> I renamed that thread because IMHO there is a another issue related to that
> threat. Should we upgrade our system and lost a significant amount of
> XFlops... ? What should be consider :   - the risk  - your user population
> (size / type / average "knowledge" of hacking techs...)  - the isolation
> level from the outside (internet)
> 
> So here is me question : if this is not confidential, what will you do ?
> I would not patch our little local cluster, contrary to all of our other
> servers. Indeed, there is another "little" risk. If our strategy is to
> always upgrade/patch, in this particular case you can loose many users that
> will complain about perfs... So another question : what is your global
> strategy about upgrades on your clusters ? Do you upgrade it as often as
> you can ? One upgrade every X months (due to the downtime issue) ... ?
> 
> Thanks,
> Best regardsRémy.
> 
> -------- Message d'origine --------De : John Hearns via Beowulf
> <beowulf at beowulf.org> Date : 03/01/2018  09:48  (GMT+01:00) À : Beowulf
> Mailing List <beowulf at beowulf.org> Objet : Re: [Beowulf] Intel CPU design
> bug & security flaw - kernel fix imposes performance penalty Thanks Chris. 
> In the past there have been Intel CPU 'bugs' trumpeted, but generally these
> are fixed with a microcode update. This looks different, as it is a
> fundamental part of the chips architecture.However the Register article
> says: "It allows normal user programs – to discern to some extent the
> layout or contents of protected kernel memory areas" I guess the phrase "to
> some extent" is the vital one here. Are there any security exploits which
> use this information? I guess it is inevitable that one will be engineered
> now that this is known about. The question I am really asking is should we
> worry about this for real world systems. And I guess tha answer is that if
> the kernel developers are worried enough then yes we should be too.
> Comments please.
> 
> 
> 
> On 3 January 2018 at 06:56, Greg Lindahl <lindahl at pbm.com> wrote:
> 
> On Wed, Jan 03, 2018 at 02:46:07PM +1100, Christopher Samuel wrote:
> > There appears to be no microcode fix possible and the kernel fix will
> > 
> > incur a significant performance penalty, people are talking about in the
> > 
> > range of 5%-30% depending on the generation of the CPU. :-(
> 
> The performance hit (at least for the current patches) is related to
> 
> system calls, which HPC programs using networking gear like OmniPath
> 
> or Infiniband don't do much of.
> 
> 
> 
> -- greg
> 
> 
> 
> 
> 
> _______________________________________________
> 
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> 
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list