[Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

Lux, Jim (337K) james.p.lux at jpl.nasa.gov
Mon Aug 20 10:27:59 PDT 2018


All complex systems have flaws. It's more a matter of deciding which flaws are acceptable and which aren't, which is driven by economic factors for the most part - the cost of fixing the flaw (and potentially introducing a new one) vs the cost of damage from the flaw.

I'd find it hard to believe that Intel's CPU designers sat around implementing deliberate flaws ( the Bosch engine controller for VW model).

I'd not find it hard to believe that someone, somewhere raised a speculation about a potential flaw, among many others.  That one just didn't happen to get resources applied to it, others did.  Picking which ones to attack and spend resources on is a difficult question, and often gets answered based on totally irrelevant factors. 

That's not negligence - that's just "it is impossible to discover and fix all possible bugs"

This is not unusual even in MUCH simpler chips-I have some 8 bit wide level shifters (from 2.5 to 3.3V logic) that have an obscure behavior with the rate at which the two power supplies come up that causes them not to pass data (preventing the system in which they are installed from booting). About 1 out of 500 times. The mfr's response is "yeah, we think we can duplicate that, but we've moved on to a newer version of that chip, why don't you replace the chips with the new ones".  This isn't an necessarily an issue of the chip not performing to the datasheet specs (essentially, the data sheet is silent on this).

The Errata and Notes lists for complex parts (like CPUs and large FPGAs) runs to hundreds of pages, and continuously grows as people find more odd behaviors.


Therefore - one should assume your system has unknown flaws and design your software and operational procedures accordingly.


James Lux
Project Manager, SunRISE - Sun Radio Interferometer Space Experiment
Task Manager, DARPA High Frequency Research (DHFR) Space Testbed
Jet Propulsion Laboratory  (Mail Stop 161-213)
4800 Oak Grove Drive
Pasadena CA 91109
(818)354-2075 (office)
(818)395-2714 (cell)
-----Original Message-----
From: Beowulf [mailto:beowulf-bounces at beowulf.org] On Behalf Of Jörg Saßmannshausen
Sent: Sunday, August 19, 2018 2:00 PM
To: beowulf at beowulf.org
Subject: Re: [Beowulf] RHEL7 kernel update for L1TF vulnerability breaks RDMA

Dear all,

whereas I am accepting that no system is 100% secure ans bug-free, I am beginning to wonder whether the current problems we are having are actually design flaws and whether, and that is the more important bit, Intel and other vendors did know about it. I am thinking of the famous 'diesel-engine' scandal and, continuing this line of thought, dragging the vendors into the limelight and get them to pay for this. 
I mean, we have to sort out the mess the company was making in the first place, have to judge whether to apply a patch which might decrease the performance of our systems (I am doing HPC, hence my InfiniBand question) versus security. 
Where will it stop?

Given the current and previous 'bugs' are clearly design flaws IMHO, what are the chances of a law suite? The any compensation here should go to Open Source projects, in my opinion, which are making software more secure. 

Any comments here?

All the best

Jörg

Am Sonntag, 19. August 2018, 06:11:16 BST schrieb John Hearns via Beowulf:
> Rather more seriously, this is a topic which is well worth discussing, 
> What are best practices on patching HPC systems?
> Perhaps we need a separate thread here.
> 
> I will throw in one thought, which I honestly do not want to see happening.
> I recently took a trip to Bletchley Park in the UK. On display there 
> was an IBM punch card machine and sample punch cards Back in the day 
> one prepared a 'job deck' which was collected by an operator in a 
> metal hopper then wheeled off to the mainframe. You did not ever touch 
> the mainframe. So effectively an air gapped system. A system like that 
> would in these days kill productivity.
> However should there be 'virus checking' of executables  before they 
> are run on compute nodes.
> One of the advantages lauded for Linux systems is of course that 
> anti-virus programs are not needed.
> 
> Also I should ask - in the jargon of anti-virus is there a 'signature' 
> for any of these exploit codes? One would guess that bad actors copy 
> the example codes already published and use these almost in a cut and 
> paste fashion. So the signature would be tight loops repeatedly 
> reading or writing to the same memory locations. Can that be 
> distinguished from innocent code?
> 
> On Sun, 19 Aug 2018 at 05:59, John Hearns <hearnsj at googlemail.com> wrote:
> > *To patch, or not to patch, that is the question:* Whether 'tis 
> > nobler in the mind to suffer The loops and branches of speculative 
> > execution, Or to take arms against a sea of exploits And by opposing 
> > end them. To die—to sleep, No more; and by a sleep to say we end The 
> > heart-ache and the thousand natural shocks That HPC is heir to: 'tis 
> > a consummation Devoutly to be wish'd. To die, to sleep
> > 
> > On Sun, 19 Aug 2018 at 02:31, Chris Samuel <chris at csamuel.org> wrote:
> >> On Sunday, 19 August 2018 5:19:07 AM AEST Jeff Johnson wrote:
> >> > With the spate of security flaws over the past year and the 
> >> > impacts
> >> 
> >> their
> >> 
> >> > fixes have on performance and functionality it might be 
> >> > worthwhile to
> >> 
> >> just
> >> 
> >> > run airgapped.
> >> 
> >> For me none of the HPC systems I've been involved with here in 
> >> Australia would have had that option.  Virtually all have external 
> >> users and/or reliance on external data for some of the work they 
> >> are used for (and the sysadmins don't usually have control over the 
> >> projects & people who get to use them).
> >> 
> >> All the best,
> >> Chris
> >> --
> >> 
> >>  Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
> >> 
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
> >> Computing To change your subscription (digest mode or unsubscribe) 
> >> visit http://www.beowulf.org/mailman/listinfo/beowulf

_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf


More information about the Beowulf mailing list