[Beowulf] [upgrade strategy] Intel CPU design bug & security flaw - kernel fix imposes performance penalty

Jörg Saßmannshausen sassy-work at sassy.formativ.net
Sun Jan 7 14:44:33 PST 2018


Dear all,

Chris is right here. It depends on what is running on your HPC cluster. You 
might not see a performance degrade at all, or you might see one of 30% (just 
to stick with that number).
Also, if you got a cluster which is solely used by one research group the 
chances they are hacking each other are slim I would argue. That leave still 
the argument about a compromised user account.
If you are running a large, multi user, multi institutional cluster you might 
want to put security over performance. This might be especially true if you 
are using confidential data like patient data. 
So, you will need to set up your own risk matrix and hope you made the right 
decision. 
For me: we have decided to upgrade the headnode but for now leave the compute 
nodes untouched. We then can decide at a later state whether or not we want to 
upgrade the compute nodes, maybe after we done some testing of typical 
programs. It is not an ideal scenario but we are living in a real and not 
ideal world I guess.

All the best from London

Jörg


Am Montag, 8. Januar 2018, 09:24:12 GMT schrieb Christopher Samuel:
> On 08/01/18 09:18, Richard Walsh wrote:
> > Mmm ... maybe I am missing something, but for an HPC cluster-specific
> > solution ... how about skipping the fixes, and simply requiring all
> > compute node jobs to run in exclusive mode and then zero-ing out user
> > memory between jobs ... ??
> 
> If you are running other daemons with important content (say the munge
> service that Slurm uses for authentication) then you risk the user being
> able to steal the secret key from the daemon.
> 
> But it all depends on your risk analysis of course.
> 
> All the best!
> Chris



More information about the Beowulf mailing list