[Beowulf] Which distro for the cluster?

Mon Jan 8 08:06:04 PST 2007

----- Original Message ----- 
From: "Robert G. Brown" <rgb at phy.duke.edu>
To: "Leif Nixon" <nixon at nsc.liu.se>
Cc: <beowulf at beowulf.org>
Sent: Wednesday, January 03, 2007 3:51 PM
Subject: Re: [Beowulf] Which distro for the cluster?

> On Wed, 3 Jan 2007, Leif Nixon wrote:
>
>> "Robert G. Brown" <rgb at phy.duke.edu> writes:
>>
>>> Also, plenty of folks on this list have done just fine running "frozen"
>>> linux distros "as is" for years on cluster nodes.  If they aren't broke,
>>> and live behind a firewall so security fixes aren't terribly important,
>>> why fix them?
>>
>> Because your users will get their passwords stolen.
>>
>> If your cluster is accessible remotely, that firewall doesn't really
>> help you very much. The attacker can simply login as a legitimate user
>> and proceed to walk through your wide-open local security holes.
>
> So:
>
>   a) Our cluster wasn't remotely accessible.  In fact, it was on a
> 192.168 network and in order to even touch it, one had to login to an up
> to date, carefully defended desktop workstation login server in the
> department.

Of course it is the layman talking here:

Anything that is connected to a network is going to get hacked if there is 
more than interesting information to find.

The real question is: "what type of software/data do you store?"

Depending upon the answer hackers will get in or will not even try.

Guided missile data? Oh boy... (even oh boy when it is a total average 
university project that is going to draw 0 interesting conclusions,
with the usual fraud of data modifying in order to let the results look 
better, rather than fix the software)

Most hackers seem to just automatically collect everything in order to keep 
busy.

There is weird holes in all kind of distributions.
A few weeks ago i installed a debian server here and for some weird reason 
some kind of email client opened a port default (port 25).

>   b) If an attacker has compromised a user account on one of these
> workstations, IMO the security battle is already largely lost.  They

If it is possible to login somehow and store files on some kind of 
harddrive, then you're already rooted in no-time.
It is too easy in UNIX/IRIX/LINUX to READ data.

Not necessarily modify data, but READING other persons data is far easier 
than modifying data.

Think of caches that store data and so on.

In the long run the real big problem is not taking care that no one from 
outside can get into your machine.

The problem is all the different types of software that users run on their 
accounts. Like my debian router/firewall is a joke of a firewall,
because the windows clients behind it can nearly freely access the internet.

Just 1 bad program that has some sort of spyware that goes outside, and 
hackers can use that same spyware channel to get in.

With respect to remotely accessing clusters over the internet, you can call 
those of course semi-secure, because the way you
access them is not secure. Some SSH type connection is enough to get rid of 
a few unorganized criminals who usually cannot tap
the entire conversation stream.

In case of PGP when the first bits arrive, just flip a protocol bit, after 
which the entire response goes unencrypted, and just before
shipping it to the receiving client, encrypt it for that client. He won't 
notice.

However that is all very paranoia thinking.
When using default security, things are pretty safe, because no one can tap 
the entire conversation.

> have a choice of things to attack or further evil they can try to wreak.
> Attacking the cluster is one of them, and as discussed if the cluster is
> doing real parallel code it is likely to be quite vulnerable regardless
> of whether or not its software is up to date because network security is
> more or less orthogonal to fine-grained code network performance.

> Still, a cluster is paradoxically one of the best monitored parts of a
> network.  Although it would make a gangbusters DoS platform, network

You don't hack a cluster in order to start a DoS attack.

If a cluster gets hacked it'll be for the software that runs on it and the 
output data.

> traffic on the cluster, cpu consumption on the cluster, user access to
> the cluster are all relatively carefully monitored.  The cluster
> installation is likely to be different enough and "odd" enough to make
> standard rootkit encapsulations fail for anyone but the legendary
> Ubercracker (who can always do whatever they want anyway, right?;-) In
> an organization that tightly monitors everything all the time on general
> security principles (first line of defense, really, as one can NEVER be
> sure all exploitable holes are closed even with a yum-updated, stable,
> currently supported distro and human eyes are better at picking up
> anomalies in system operation than any automated tool) I think it is
> pretty likely that any attempt to take over a cluster and use it for
> diabolical ends would be almost instantly detected.

I feel the real problem is not so much misusage of your hardware by the 
Uebercracker,
as well as that some companies fear that their data can get read by others. 
Years of original
development of your idea and hardware, stolen by a simple hackattempt.

Most importantly giving your competitor new ideas on how to progress, even 
more important than
that they can "reproduce" your original idea.

But basically paranoia only applies to software/hardware that falls under 
category 5 of the wassenaar treaty
( www.wassenaar.org ), in about every other case the average person in ICT 
has too much paranoia, whereas
there is nothing wrong and no one is stealing his data.

btw that doesn't apply to collegues of me, as i detected they install 
together with their software spyware (like shredder classic does do,
a weird program called wuw.exe or windows update wizard that mcafee didn't 
detect), of course as usual, that is windows software.

> BTW, the cluster's servers were not (and I would not advise that servers
> ever be) running the old distro -- we use a castle keep security model
> where servers have extremely limited access, are the most tightly
> monitored, and are kept aggressively up to date on a fully supported
> distro like Centos.  The idea is to give humans more time to detect
> intruders that have successfully compromised an account at the
> workstation LAN level and squash them like the nasty little dung beetles
> that they are.

"Castle keep security model".

You mean it has been airgapped?

> FWIW, our department is entirely linux at the server level, and almost
> entirely linux at the workstation level.  A very few experimental groups
> and individuals run either Windows boxes (usually to be able to use some
> particular software package) or Macs (because they are, umm, "that kind
> of user":-).  I'm guessing that the ratio is something like 4:1 linux to
> Win at the workstation level (Macs down there in the noise) and maybe
> 10:1 linux to win if you include cluster nodes, whatever OS they might
> be running.

What if someone installs on a windoze box for example shredder classic,
which spyware communicates to outside with the great security of 32 bits 
RSA,
it gotta run fast on a 32 bits machine of course,
meaning that if some clever student, who just got a job, manages to crack 
that,
he can take over the communication will manage to root your network and get 
all
data he wants from it.

Of course this is just a paranoia thought experiment, as your uni doesn't 
have of course
anything interesting to anybody, let alone that you can make money with it, 
let alone that
it is interesting.

> Since Seth introduced yup on top of RH (maybe 7-8 years ago?  How time
> flies...), and then proceeded to write yum to replace yup for RPM
> distros in general, we haven't had a single successful promotion to root
> in the department.  Nothing done locally can prevent some grad student's
> password from being trapped as they login from some compromised
> win-based system in their hometown over fall break, but the very few of
> these that have occurred have been quickly detected and quickly squashed
> without further compromise.

> In that same interval, we had a WinXX system compromised and turned into
> a pile of festering warez rot something like twice a year.  Pretty
> amazing given that they are kept up to date as best as possible and they
> make up only 10-20% of our total system count.

"How bad is it Humphrey?"

"Yes Minister, only 10% of our organization has been infected,
so we do not need to start some commie hunt within our organisation at all,
as 90% of it is clean, so not a SINGLE file could have been possibly taken
away, as those 90% would have noticed it; besides a file needs 12 stamps 
from
6 different departments before it can get out".

>> But you know this already.
>
> Oh yeah;-)
>
> And we didn't do this "willingly" and aren't that likely to repeat it
> ourselves.  We had some pretty specific reasons to freeze the node
> distro -- the cluster nodes in question were the damnable Tyan dual
> Athlon systems that were an incredible PITA to stabilize in the first
> place (they had multiple firmware bugs and load-based stability issues
> under the best of circumstances).  Once we FINALLY got them set up with
> a functional kernel and library set so that they wouldn't crash, we were
> extremely loathe to mess with it.  So we basically froze it and locked
> down the nodes so they weren't easily accessible except from inside the
> department, and then monitored them with xmlsysd and wulfstat in
> addition to the usual syslog-ng and friends admin tools.
>
> Odd usage patterns (that is, almost any sort of running binary that
> wasn't a well-known numerical task associated with one of the groups,
> logins by anyone who wasn't a known user) would have been noticed by any
> of a half-dozen people, one of whom was me, almost immediately.  The
> kernel was "barely stable" as it was and couldn't easily have been
> replaced with a hacker kernel (to e.g. erase /proc trace) without a VERY
> high probability that the hacker kernel would crash the system and
> reveal the hacker on the first try. xmlsysd reads all sorts of stuff
> from all over /proc and was custom code that I was working on and
> periodically updating, even while Seth was working on yum and updating
> THAT.  Somebody would have had to literally custom craft some very
> advanced C code to stay hidden on the cluster and even then would have
> been revealed by e.g. an update of xmlsysd unless they were a bit beyond
> even Ubercracker status.
>
> In general, though, it is very good advice to stay with an updated OS.
> My real point was that WITH yum and a bit of prototyping once every
> 12-24 months, it is really pretty easy to ride the FC wave on MANY
> clusters, where the tradeoff is better support for new hardware and more
> advanced/newer libraries against any library issues that one may or may
> not encounter depending on just what the cluster is doing.  Freezing FC
> (or anything else) long past its support boundary is obviously less
> desireable.  However, it is also often unnecessary.
>
> On clusters that add new hardware, usually bleeding edge, every four to
> six months as research groups hit grant year boundaries and buy their
> next bolus of nodes, FC really does make sense as Centos probably won't
> "work" on those nodes in some important way and you'll be stuck
> backporting kernels or worse on top of your key libraries e.g. the GSL.
> Just upgrade FC regularly across the cluster, probably on an "every
> other release" schedule like the one we use.
>
> On clusters (or sub-clusters) with a 3 year replacement cycle, Centos or
> other stable equivalent is a no-brainer -- as long as it installs on
> your nodes in the first place (recall my previous comment about the
> "stars needing to be right" to install RHEL/Centos -- the latest release
> has to support the hardware you're buying) you're good to go
> indefinitely, with the warm fuzzy knowledge that your nodes will update
> from a "supported" repo most of their 3+ year lifetime, although for the
> bulk of that time the distro will de-facto be frozen except for whatever
> YOU choose to backport and maintain.
>
> And really, there isn't much stopping folks from adopting a range of
> "mixed" strategies -- running FC-whatever on new nodes for a year or
> whatever as needed in order to support their hardware or use new
> libraries, then reinstalling them with Centos/RHEL (which is basically
> FC-even-current-at-release-time frozen and supported or so it seems
> recently anyway) as Centos support catches up with the hardware by
> syncing with an FC-current on a new release.
>
> Nowadays, with PXE/Kickstart/Yum (or Debian equivalents, or the OS of
> your choice with warewulf, or...) reinstalling OR upgrading a cluster
> node is such a non-event in terms of sysadmin time and effort that it
> can pretty much be done at will.  Except for pathological cases (like
> the Tyans) we're talking at most a few days of sysadmin time to set up a
> prototyping node or four, flash over to the new distro via a discrete
> node reboot (unattended automated reinstall or a new node diskless
> image), and let selected users whack on it for a week or two.  If it
> proves invisibly stable and satisfactory -- the rule rather than the
> exception -- crank it on up across the cluster.  Even if it "fails" on
> some untested pathway after you do this, it costs you at most a reboot
> (again to a reinstall/replacement of a node image) to put things back as
> they were while you fix things.
>
> The worst thing that such a strategy might require is a rebuild of user
> applications for both distros, but with shared libraries to my own
> admittedly anecdotal experience this "usually" isn't needed going from
> older to newer (that is, an older Centos built binary will "probably"
> still work on a current FC node, although this obviously depends on the
> precise libraries it uses and how rapidly they are changing).  It's a
> bit harder to take binaries from newer to older, especially in packaged
> form.  There you almost certainly need an rpmbuild --rebuild and a bit
> of luck.
>
> Truthfully, cluster installation and administration has never been
> simpler.
>
>    rgb
>
> -- 
> Robert G. Brown                        http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>