[Beowulf] confidential data on public HPC cluster

Guy Coates gmpc at sanger.ac.uk
Tue Mar 2 02:25:56 PST 2010

Ashley Pittman wrote:
> On 1 Mar 2010, at 19:08, Jonathan Dursi wrote:
>> These are all good things to keep in mind.
>> There must be people out there with users who do biomed work with its attendant confidentiality issues, 
> or users who work on commercial confidential data sets -- engineering or otherwise.

Hi all,

The usual answer you will get from lawyers and compliance officers is

"You should take reasonable care to ensure that data is kept appropriately."

However, most (all?) biomedical projects should have some sort of
data-access agreement (DAA). That document states what  patients have
given consent for, who should have access to the data and under what
conditions. That should give you a good starting point for working out
what your security policy should be. (If you are going to be doing
systems stuff for the group, you should also have signed the agreement.)

Generally speaking, the greater the chance of being to trace data back
to a specific individual, then the more paranoid you have to be about
the data. It is up to the primary investigators, lawyers, compliance
officers and sys-admins to turn that into a security policy.

At Sanger, we run through the whole range of security policies. We have
projects that deal routinely with full medical histories. They run on a
set of machines physically separated from the rest of our datacentre
infrastructure, with data held in encrypted databases with 2 factor
logins. Data is not allowed to be removed from that setting.

We have other projects that are using anonymised datasets, and that data
can be held on our main cluster with the appropriate unix access controls.

In the future we will probably have projects whose security requirements
would be somewhere in the middle of those two extremes. The key do
dealing with those projects are the words "reasonable care".

Would we worry about data being kept un-encrypted in memory? Probably
not.  Would we put in place an automated audit process to ensure data
kept on filesystems have appropriate ACLs set? Probably yes.

And remember, if someone goes out of their way to get access to data
that they should not, then that is a contravention of the AUP and/or
local computer crime laws.

(You do make your users sign an AUP, right...?)

There are some example DAAs below.






Dr. Guy Coates,  Informatics System Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802

 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

More information about the Beowulf mailing list