[Beowulf] hpl size problems

Wed Sep 28 10:42:23 PDT 2005

On Wed, 28 Sep 2005, Robert [UTF-8] G. Brown wrote:
> laytonjb at charter.net writes:
> >> > At most you waste a few seconds and a few
> >> > hundred megabytes of disk by leaving them in, 
> >> 
> >> Latter yes (more like GB), former no.  Trimming the fat from the SuSE 
> >> cluster install got it from over an hour down to about 8 minutes with 
> >> everything, per node.

This is more typical: a distribution that comes on many CD-Rs isn't going 
to be easily stripped down to something that can loaded in a few seconds.
A stripped-down install will take on the order of 5 minutes.

> >> I think I hit the point of diminishing returns.  I don't mind waiting up 
> >> to about 10 minutes for a reload, beyond that, I mind.

This really isn't scalable: even 5 minutes per machine has a big impact on 
how you consider operating a cluster of dozens or hundreds of machines.

> > Another thing to consider is that having all of that extra
> > stuff on the nodes leads to a huge security tracking
> > headache.
...
> > put back into production. By not having all of the cruft on
> > the nodes, our security headache could have been reduced.

Consider taking that idea to the logical conclusion: by eliminating 
everything but the user applications on the nodes you can eliminate not 
just the appearance of a security problem, you can eliminate the 
opportunity.

There is a good reason for updating vulnerable daemons and services even 
if they are not currently enabled.  What if they are turned to -- "gee, 
I'll just turn on the web server so that this new admin tool works 
through the firewall".

> I agree, guys, I agree.  My point wasn't that trimming cluster
> configurations relative to workstation or server configurations is a bad
> thing -- it is not, and indeed one would wish that eventually e.g. FC,
> RHEL, Centos, Caosity etc will all have a canned "cluster configuration"
> in their installers to join server and workstation, or that somebody
> will put up a website with a generic "cluster node" kickstart fragment
> containing a "reasonable" set of included groups and packages for people
> to use as a baseline that leaves most of the crap out.

We went down this path years ago.  It doesn't take long to find the 
problem with striping down full installations to make minimal compute node 
installs: your guess at the minimal package set isn't correct.  You might 
not think that you need the X Window system on compute nodes.  But your 
MPI implementation likely requires the X libraries, and perhaps a few 
interpreters, and the related libraries, and some extra configuration 
tools for those, and...

Yes, there are a number of labor-intensive ways to rebuild and repackage 
to break these dependencies.  But now you have a unique installation that 
is a pain to update.  There is no synergy here -- workstation-oriented
packages don't have the same motivations that compute cluster or server 
people have.

>   a) In most cases the crap doesn't/won't affect performance of
> CPU/memory/disk bound HPC tasks.

Except for additional cruft automatically installed and 
started.  Your compute nodes might not ever need 'xfs' (the X font server, 
not the file system), but it will be started anyway.

> either a cluster operating system (e.g. scyld) or a really sparse and
> tuned install (e.g. sparse and tuned warewulf or a similarly sparse and
> tuned kickstart or...)

>   b) There are a lot of things that can be USEFUL on general purpose
> cluster nodes.  I always put editors on them, for example, and
> programming tools and compilers, because every now and then I'm logged
> into one and want to work on code.

You are starting out with the idea that you will be logging into every 
node.  Once you make that assumption, you need the whole set of support 
tools.  Even something as simple as an editor that matches your 
primary environment implies a whole set of additional support.  ("Of 
course I expect indent support for Prolog and syntax validation for APL!")  

(Admittedly it's easy to mitigate this: put all of your admin/user 
interaction tools on a network file system that only needs to be mounted 
when logged in.  But there are better solutions.)

> anyway). Similarly I want to be able to read man pages (while I'm coding
> for certain) so I put them on.  They drag TeX along which I don't mind
> because I use it anyway for a lot of things and maybe will be doing a
> build in an application directory that has a tex-based manual or paper
> in it and it bugs me when a build fails because of missing resources.
> So do I put geyes on?  Of course not.

Uhmmm, but my start-up depends on 3D-Xeyes and the Klingon fonts!
 (..thanks for making my point.)

> Network daemons are an OBVIOUS EXCEPTION to this -- network services
> should ALWAYS be carefully considered, even on plain old LAN
> workstations, because of both security and performance.

A cluster is a single machine.  It should run a single set of network 
services.  Pure compute nodes of the cluster need not duplicate services.  
So you should start with no daemons (and no configuration files) rather 
than stripping down and turning off (and writing bunches of ad hoc 
configuration file generation scripts).

> Things like
> ipchains or ipfilters tend to be "expensive" overhead on all TCP/UDP
> connections, and overhead in parallel computations is anathema.

A different topic.  It's one we can't win.  The structure for ipchains and 
ipfilters costs no matter what you do.  Disabling them doesn't 
revert the code to the simple, fast case.  It just makes it impossible to 
use the feature.  It's much the same as passing all output through
  "| grep -v $emptyvar | ..."
You can make it semantically do nothing by not actually having matching 
rules, but you still have the overhead.

> So I agree, I agree -- thin is good, thin is good.  Just avoid needless
> anorexia in the name of being thin -- thin to the point where it saps
> your nodes' strength.

You've got the wrong perspective: you don't build a thin compute node from 
a fat body.  You develop a system that dynamically provisions only the 
needed elements for the applications actually run.  That takes more than a 
single mechanism to do correctly, but you end up with a design that has 
many advantages.  (Sub-second provisioning, automatic update consistency, 
no version skew, high security, simplicity...)

-- 
Donald Becker				becker at scyld.com
Scyld Software	 			Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220		www.scyld.com
Annapolis MD 21403			410-990-9993