[Beowulf] SIMD exception kernel panic on Skylake-EP triggered by OpenFOAM?
Chris Samuel
chris at csamuel.org
Wed Sep 12 00:40:13 PDT 2018
On Monday, 10 September 2018 2:23:18 PM AEST Jonathan Engwall wrote:
> If it is helpful there are a few similar bugs, generally considered
> unreproducible. One thread calls it bogus xcomp_bv...the kernel clobbers
> itself writing zeroes when that is not the state. And spectre came up. One
> suggestion is to disable IBRS; according to other sources IBRS is dangerous
> to disable and should protect against Spectre. Maybe the OpenFOAM is to
> blame.
Yeah, I suspect what we're seeing is different to that, it looks like
something manages to generate a SIMD exception whilst the kernel is dealing
with an APIC timer interrupt. A colleague has backported this patch that I
found to our CentOS kernel in case it helps.
https://lore.kernel.org/patchwork/patch/953364/
For now we've constrained this users workload on to a handful of nodes as they
are trying to get some project work done.
All the best!
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the Beowulf
mailing list