[Beowulf] confidential data on public HPC cluster

Mark Hahn hahn at mcmaster.ca
Mon Mar 1 09:35:07 PST 2010

> requirements for keeping data confidential.    We expect these to be the

it's critically important to pin down exactly what they mean by that.
for instance, anything involving human subjects, not limited to clinical
data, needs to be blinded.  that's a standard requirement from any 
research-ethics board.

it's also worth going over the basics of permissions, since researchers 
often don't understand what rwxrx-- means ;)

> other requirements.   If even having small fractions of the data unencrypted 
> in memory on a node that someone else could login to (even if only as root) 
> is not allowed, then I imagine it's going to be hard for them to use any 
> machine they don't physically control.   But presumably many other users will 
> have less strict conditions on what is and isn't allowed.

researchers also don't think like a security person: there's no way
someone can expect confidentiality from root unless the machine is 
completely under their control (bare machine + install media, etc).
we have a facility on campus that has data from StatsCan, and does 
indeed go to these sort of lengths.  but that's completely incompatible
with any sort of shared facility.

it's easy to imagine "security theater" which might make people feel better
though.  for instance, one might offer them VM hosting, instead of the 
traditional just-another-unix-user approach.  or even a deal to wipe the 
machine and install from scratch at the begining of the job - reboot when
you're done!  but these are simply making it harder to compromise, and 
IMO would just lead to a tar pit of obfuscation, not real security.
(for instance, compromising a running VM is probably not hard, but tweaking
the image before it runs would be easier.  does the occupant then try to 
validate the integrity of the VM?  how hard is it to intercept that check?
can they then detect the interception?  this applies to installing a node
from media, as well.)

ultimately, someone somewhere needs admin access, so it's not really a
question of whether disclosure is possible, but rather who you trust.
as a sysadmin, I wouldn't be upset about being asked to go through 
a background check, and my employer could obtain bonding for me.

an audit-trail is "post-coital", but may still make sensitive clients more
comfortable (though it's likely to be security theatre as well...)
consider, for instance, if a group's storage is on a separate server,
whose access is limited to specific admins, and whose mountd logs 
are available for the group's perusal.  even setting up jobs to use sshfs
back to the group's own server may make them feel better because they'll
be able to look at the logs (again, not impregnable, just harder to get.)

regards, mark hahn
PS: my apology to anyone allergic to innuendo!

More information about the Beowulf mailing list