[Beowulf] Dealing with masquerade attacks (Was: CLuster - Mpich - tstmachines - Heeelp !!!!!!!!)

Sat Jul 29 04:37:02 PDT 2006

Mark Hahn <hahn at physics.mcmaster.ca> writes:

> this is wandering pretty far afield.  a cluster, to my way of thinking,
> is intended to act as a single resource, and as such is a single trust
> domain.

I used to think that, as well. However, expensively bought experience
has taught me otherwise.

Events the last two years [1] have shown that if you have a cluster
that is somehow reachable from the Internet there is a non-negligible
risk that an intruder at some point will log in on it using stolen
credentials. I know for a fact that a large fraction of Swedish
academic clusters have had such visits.

You cannot trust your users, because that user over there might
actually be a pimply-faced kid holding a freshly stolen password in
his sweaty palms.

I don't see the world doing away with password or private-key-on-disk
authentication any time soon, so this problem is here to stay, I'm
afraid. We have to learn to live with it.

In general, we have to acknowledge that our security will always be
slightly broken. This means that you can't put all your effort and
trust in a perimeter style defense, because it will never be perfect
and one day somebody will penetrate it. You need defense-in-depth.
(That old, worn phrase)

Which in this case boils down to: Yes, you do need internal security
barriers in your cluster.

As you note, hardening a cluster to untrusted external users "would
take quite a bit of effort", but even when it would be unrealistic to
go full-out virtualized and compartmentalized, you should still keep
these issues in mind when designing a cluster.

Ask yourself, what happens if an intruder gets access to one of the
machines in the cluster? It's very hard to totally stop the intrusion
from spreading across the cluster, but you *can* make life harder for
the intruder, which might just buy you enough time to detect the
intrusion in its early stages.

So, for example, do you really need unlimited passwordless access
across the entire cluster, or can you limit it in useful ways? Perhaps
you can hook PAM up to PBS, so users only can access nodes they are
scheduled on? Pay special attention to how root is allowed to access
other machines. Export NFS filesystems read-only and mount them
nosuid, unless you really need rw/suid. And, of course, never leave a
security hole unpatched because it's "just a local vulnerability". And
so on.

And lay traps. Think about ways to detect abnormal user activities. I
won't go into details on this on a public list, and anyway it depends
on your user population, but it's really, really satisfying when an
intruder falls into a trap and you can descend upon them, verily like
the Wrath of God, while they are busily trying to get root on your
login node. 8^)

[1] The Stakkato intrusions a.k.a. The Teragrid Incident a.k.a. FBI
    Case 216, as well as a couple of large scale keysniffer attacks.

-- 
Leif Nixon                       -            Systems expert
------------------------------------------------------------
National Supercomputer Centre    -      Linkoping University
------------------------------------------------------------