[Beowulf] Dealing with masquerade attacks (Was: CLuster - Mpich - tstmachines - Heeelp !!!!!!!!)

Sat Jul 29 08:18:20 PDT 2006

> Events the last two years [1] have shown that if you have a cluster
> that is somehow reachable from the Internet there is a non-negligible
> risk that an intruder at some point will log in on it using stolen
> credentials. I know for a fact that a large fraction of Swedish

but that's obvious.  my point is that erecting barriers within the cluster
either doesn't add security or else falls on the far side of the
security/usability continuum.

> I don't see the world doing away with password or private-key-on-disk
> authentication any time soon, so this problem is here to stay, I'm
> afraid. We have to learn to live with it.

again, that's not the point.  once an attacker has a user's key (password,
private key, whatever), what barrier is there to running on other nodes 
in the cluster?  the cluster exists precisely for the purpose of running 
a user's jobs, so I claim rsh is just fine as part of the job-starting
mechanism.  by itself, it does not let a rogue user escalate to ownership
of other users, machines, or root.

things you do to detect attacks and hardening are orthogonal to this,
since a cluster _exists_ to run passwordlessly, user processes on all nodes.

> machines in the cluster? It's very hard to totally stop the intrusion
> from spreading across the cluster, but you *can* make life harder for
> the intruder, which might just buy you enough time to detect the
> intrusion in its early stages.

that's not the point.  sure, I love RO filesystems exported from servers
which do not trust the user domain of the cluster.  I love syslogs from
cluster members directed to similar more secure, less-trusting machines.
the batch/queueing/scheduling system I wrote uses ssh to do all its spawning.

but I just don't see why banning rsh is of any value unless you really 
go so far as to say that root on each machine is untrusted.  that's really
quite difficult if you think about it - for instance, it means that you
have to IPMI reset the machine and hard-boot or reinstall after every job,
since that's the only way to to trust root enough to mount a filesystem.

the VM approach I mentioned could deal with this, since the user never 
actually runs on bare metal, but only within his own VM.  so you don't 
actually have to wipe the hardware for each job, just the virtual hardware.

> So, for example, do you really need unlimited passwordless access
> across the entire cluster, or can you limit it in useful ways? Perhaps

but can you dispatch a job without a password?  if so, you have given
the equivalent of passwordless access to users.

> you can hook PAM up to PBS, so users only can access nodes they are
> scheduled on?

OK, here's my framework:

 	1. there may be security-like policies which exist to avoid bad user
 	behavior, protect privacy, etc.  whether a user can get a shell on a
 	compute node is a common example (and on any node, or just those
 	running	his job?)

 	2. there are secure-i-ness </colbert> features which can make the
 	system somewhat harder (or noisier) to exploit, even though they
 	don't actually create a new level in the formal-security sense.
 	ssh vs rsh within the cluster might qualify, but I don't see why.

 	3. there is some security design, in the absolute sense, which should
 	be known and examined.  for instance, letting users login to an NFS
 	server is risky.  having an NFS server which does not permit logins
 	from the cluster is a fundamentally more secure design.