[Beowulf] hpl size problems

Tue Sep 27 15:32:25 PDT 2005

Joe Landman writes:

> <slight aside>
> 
> As for TeX, if you are installing it to your compute nodes, then I hope 
> that one of your main tasks will be to crunch lots of documentation.  I 
> know those post script placements can be somewhat challenging.  It sure 
> as heck doesn't make sense to install it (and openoffice components for 
> that matter) to cluster nodes.  I see this all the time, and one of the 
> more popular cluster "distributions" does this.
> 
> This has been a pet peeve of mine for a while.  I like the 
> install-minimum and add-needed-bits philosophy more than I like the 
> everything-including-the-kitchen-sink.  Lots of services seem to get 
> activated when you install the-kitchen-sink.

What gets activated when you install tex?  Absolutely nothing.  It is a
userspace non-daemonic application.  So this particular example is
mostly irrelevant -- it might annoy you personally to have things
installed that are never used, but it should have zero impact on node
performance.  Open Office ditto -- it doesn't jump out and run itself
AFAIK when nobody is logged into a console interface (or if the system
HAS no console interface).  At most you waste a few seconds and a few
hundred megabytes of disk by leaving them in, but the install time is
generally parallelized and nearly irrelevant over an installation
lifetime of weeks or months and the smallest disks one can buy nowadays
are huge, huge, huge compared to the fattest possible linux install.
You could install ALL the major linux distros and their entire repos on
the 160 GB disk that comes in $600 systems that are Best Buy specials,
and probably have room left for your entire ogg collection, a few
movies, and your application's scratch space.  So if you have a disk at
all, what you put on it is probably irrelevant.

Again, if you do an absolutely cold/standard FC or RH "workstation"
install, flesh it out with e.g. the GSL or any other associated
numerical libraries you might need, and measure system activity in idle
mode WITH X RUNNING (although not necessary) you SHOULD see load
averages within a whisker of "zero" (including the load imposed by e.g.
xmlsysd so you can measure), a few tens of context switches per second,
and ballpark 1000 interrupts per second.

Cluster nodes, fat or not, do about the same.  If you visit here:

  http://www.phy.duke.edu/~rgb/wulfweb/vmstat.html

you can see stats (and for that matter look at the actual running
applications) on almost all of our dual opterons at a glance.  Even
nodes carrying load averages of 14 sustained (crazed grad student, don't
ask:-) still see a measely 30 or so context switches per sec, ballpark
of 1000 interrupts per second, and manage to run without swapping or
paging.  Services that are offered but not being used just don't take
that much by way of resources.

So I personally don't have a rigid philosophy either way about what gets
installed on a node because to my direct MEASUREMENTS of performance and
activity, it just doesn't generally matter much -- 1-2% effects way down
in the noise, with the exception of a very few applications or daemons
that e.g. poll or are themselves running applications (screensavers).
YMMV, see below for a generica class of exceptions already discussed in
this thread.

If a node/system has a disk and EVER might need a given application
(even tex) I wouldn't hesitate to install it -- disk is cheap, services
can be turned off, and installing things (and turning them off if need
be in %post) can be most easily done via a kickstart file and then
forgotten.  Much better than having to remember what was added to a node
afterwards, MUCH better than having to add things to a node (or all
nodes) afterwards by hand.  

OTOH, sure, if you know something isn't necessary (X11, for example, on
a system without a video card or console) by all means leave it out, and
leave out strange applications that DO suck up resources for sure
(recognizing that there aren't that many of them, really; even a fat old
standard workstation install plus this and that probably doesn't have
any).  Kickstart is ideal for this as you can tinker with your node
configuration until your package list and %post are just right, and then
just do a full (re)install.  Most of the stuff that DOES need some sort
of handiwork or that DOES run and consume real-time resources is
associated with e.g. video, audio, multimedia, and most cluster nodes
don't need it.

Diskless nodes, of course, HAVE to be thin -- they have only a ramdisk
to install in/on, and memory used for the basic installation isn't
available to the application.  So sure, install the minimum --
basically, work up from the other (init 1) end instead of down from the
workstation (init 5) end.  However, even a fat NFS diskless node (one
that e.g.  mounts a /usr and so on containing a full workstation
install) is likely to perform about the same as a thin node if the node
has enough memory, as linux is pretty good at cacheing libraries and
application pages.  If you work very hard (as Don apparently has with
scyld) you can beat its default dynamic organization and performance but
again, I generally don't see a dramatic difference in actual
microbenchmark or full program performance between init 1 and init 5 on
any system no matter how fat at init 5.  I'm sure there are applications
where this DOES make a difference, but for CPU bound, cache local or
streaming arithmetic, the CPU/memory subsystem is "doing its best" all
the time pretty much whatever you have running.

The major exception to this is tightly coupled code, as the discussion
has already pointed out.  There things like interrupts and random state
noise can have a cumulative (nonlinear) additive effect in delaying the
tightly coupled program so that even though ANY thread is only delayed a
tiny amount, ALL threads have to wait for the slowest thread to reach a
barrier, and ALL communications and other activities are subject to
random delays due to resource competition on the endpoint nodes.  There
one really wants all the nodes to be doing e.g. their timer interrupts
and bookkeeping context switches and all (if any) in sync, which
ordinary linux won't guarantee.  There is a clear reason there to
minimize the number of asynchronous tasks running on the nodes for any
reason.

If I recall correctly, this was one of the primary motivations for "the
beowulf" as opposed to a generic NOW/COW cluster, and hence is a prime
reason for the existence of this list.  So it is worthwhile noting that
all of the observations above (about overhead being negligible) apply to
clusters running memory or cpu bound coarse grained or embarrassingly
parallel applications, but become LESS true the more fine grained and
tightly coupled your computation is.  There warewulf may well be able to
bump performance by e.g. 30% by lowering noisy overhead, although I'd
still very much want to see some fairly convincing evidence that this is
the case if it were my application, as sensitivity to random node delays
at the 0.1% level or less seems like it is an important design
consideration in both code and cluster....;-)

    rgb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050927/37b753d4/attachment.sig>