[Beowulf] hpl size problems
becker at scyld.com
Wed Sep 28 10:42:23 PDT 2005
On Wed, 28 Sep 2005, Robert [UTF-8] G. Brown wrote:
> laytonjb at charter.net writes:
> >> > At most you waste a few seconds and a few
> >> > hundred megabytes of disk by leaving them in,
> >> Latter yes (more like GB), former no. Trimming the fat from the SuSE
> >> cluster install got it from over an hour down to about 8 minutes with
> >> everything, per node.
This is more typical: a distribution that comes on many CD-Rs isn't going
to be easily stripped down to something that can loaded in a few seconds.
A stripped-down install will take on the order of 5 minutes.
> >> I think I hit the point of diminishing returns. I don't mind waiting up
> >> to about 10 minutes for a reload, beyond that, I mind.
This really isn't scalable: even 5 minutes per machine has a big impact on
how you consider operating a cluster of dozens or hundreds of machines.
> > Another thing to consider is that having all of that extra
> > stuff on the nodes leads to a huge security tracking
> > headache.
> > put back into production. By not having all of the cruft on
> > the nodes, our security headache could have been reduced.
Consider taking that idea to the logical conclusion: by eliminating
everything but the user applications on the nodes you can eliminate not
just the appearance of a security problem, you can eliminate the
There is a good reason for updating vulnerable daemons and services even
if they are not currently enabled. What if they are turned to -- "gee,
I'll just turn on the web server so that this new admin tool works
through the firewall".
> I agree, guys, I agree. My point wasn't that trimming cluster
> configurations relative to workstation or server configurations is a bad
> thing -- it is not, and indeed one would wish that eventually e.g. FC,
> RHEL, Centos, Caosity etc will all have a canned "cluster configuration"
> in their installers to join server and workstation, or that somebody
> will put up a website with a generic "cluster node" kickstart fragment
> containing a "reasonable" set of included groups and packages for people
> to use as a baseline that leaves most of the crap out.
We went down this path years ago. It doesn't take long to find the
problem with striping down full installations to make minimal compute node
installs: your guess at the minimal package set isn't correct. You might
not think that you need the X Window system on compute nodes. But your
MPI implementation likely requires the X libraries, and perhaps a few
interpreters, and the related libraries, and some extra configuration
tools for those, and...
Yes, there are a number of labor-intensive ways to rebuild and repackage
to break these dependencies. But now you have a unique installation that
is a pain to update. There is no synergy here -- workstation-oriented
packages don't have the same motivations that compute cluster or server
> a) In most cases the crap doesn't/won't affect performance of
> CPU/memory/disk bound HPC tasks.
Except for additional cruft automatically installed and
started. Your compute nodes might not ever need 'xfs' (the X font server,
not the file system), but it will be started anyway.
> either a cluster operating system (e.g. scyld) or a really sparse and
> tuned install (e.g. sparse and tuned warewulf or a similarly sparse and
> tuned kickstart or...)
> b) There are a lot of things that can be USEFUL on general purpose
> cluster nodes. I always put editors on them, for example, and
> programming tools and compilers, because every now and then I'm logged
> into one and want to work on code.
You are starting out with the idea that you will be logging into every
node. Once you make that assumption, you need the whole set of support
tools. Even something as simple as an editor that matches your
primary environment implies a whole set of additional support. ("Of
course I expect indent support for Prolog and syntax validation for APL!")
(Admittedly it's easy to mitigate this: put all of your admin/user
interaction tools on a network file system that only needs to be mounted
when logged in. But there are better solutions.)
> anyway). Similarly I want to be able to read man pages (while I'm coding
> for certain) so I put them on. They drag TeX along which I don't mind
> because I use it anyway for a lot of things and maybe will be doing a
> build in an application directory that has a tex-based manual or paper
> in it and it bugs me when a build fails because of missing resources.
> So do I put geyes on? Of course not.
Uhmmm, but my start-up depends on 3D-Xeyes and the Klingon fonts!
(..thanks for making my point.)
> Network daemons are an OBVIOUS EXCEPTION to this -- network services
> should ALWAYS be carefully considered, even on plain old LAN
> workstations, because of both security and performance.
A cluster is a single machine. It should run a single set of network
services. Pure compute nodes of the cluster need not duplicate services.
So you should start with no daemons (and no configuration files) rather
than stripping down and turning off (and writing bunches of ad hoc
configuration file generation scripts).
> Things like
> ipchains or ipfilters tend to be "expensive" overhead on all TCP/UDP
> connections, and overhead in parallel computations is anathema.
A different topic. It's one we can't win. The structure for ipchains and
ipfilters costs no matter what you do. Disabling them doesn't
revert the code to the simple, fast case. It just makes it impossible to
use the feature. It's much the same as passing all output through
"| grep -v $emptyvar | ..."
You can make it semantically do nothing by not actually having matching
rules, but you still have the overhead.
> So I agree, I agree -- thin is good, thin is good. Just avoid needless
> anorexia in the name of being thin -- thin to the point where it saps
> your nodes' strength.
You've got the wrong perspective: you don't build a thin compute node from
a fat body. You develop a system that dynamically provisions only the
needed elements for the applications actually run. That takes more than a
single mechanism to do correctly, but you end up with a design that has
many advantages. (Sub-second provisioning, automatic update consistency,
no version skew, high security, simplicity...)
Donald Becker becker at scyld.com
Scyld Software Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220 www.scyld.com
Annapolis MD 21403 410-990-9993
More information about the Beowulf