[Beowulf] hpl size problems
Robert G. Brown
rgb at phy.duke.edu
Wed Sep 28 06:36:19 PDT 2005
laytonjb at charter.net writes:
>> I keep forgetting that RGB is actually a cluster of text generation
>> 'bots able to crank out more characters per second than most folks can
>> speak ... ;)
Harummph. I'm just working on trying to fool a committee into thinking
I've got a cluster that can pass the Turing test... isn't there a prize
of some sort for that?
>> > At most you waste a few seconds and a few
>> > hundred megabytes of disk by leaving them in,
>> Latter yes (more like GB), former no. Trimming the fat from the SuSE
>> cluster install got it from over an hour down to about 8 minutes with
>> everything, per node. It can go even further if I want to push it, but
>> I think I hit the point of diminishing returns. I don't mind waiting up
>> to about 10 minutes for a reload, beyond that, I mind.
> Another thing to consider is that having all of that extra
> stuff on the nodes leads to a huge security tracking
> headache. At my former employer, we had to track every
> single on every single node. When a security patch was
> released by our central IT group, then it had to be updated.
> In some cases, the updates were for trivial packages that
> we didn't use, but because they were on the cluster node
> the work had to be stopped, the cluster brought down,
> updated, and the documents updated before the cluster was
> put back into production. By not having all of the cruft on
> the nodes, our security headache could have been reduced.
I agree, guys, I agree. My point wasn't that trimming cluster
configurations relative to workstation or server configurations is a bad
thing -- it is not, and indeed one would wish that eventually e.g. FC,
RHEL, Centos, Caosity etc will all have a canned "cluster configuration"
in their installers to join server and workstation, or that somebody
will put up a website with a generic "cluster node" kickstart fragment
containing a "reasonable" set of included groups and packages for people
to use as a baseline that leaves most of the crap out.
It was just
a) In most cases the crap doesn't/won't affect performance of
CPU/memory/disk bound HPC tasks. It might affect performance on tightly
coupled synchronous parallel code, the latter being an engineering
condition that suggests the use of a "true beowulf" cluster design with
either a cluster operating system (e.g. scyld) or a really sparse and
tuned install (e.g. sparse and tuned warewulf or a similarly sparse and
tuned kickstart or...)
b) There are a lot of things that can be USEFUL on general purpose
cluster nodes. I always put editors on them, for example, and
programming tools and compilers, because every now and then I'm logged
into one and want to work on code. In our environment, the only
x86_64's we have (until today, at any rate, when I put one on my desk:-)
are in clusters, so if I want to do a build it has to be on a cluster
node. I COULD keep just one node suitably equipped for a build, but
then if that node goes down I'm screwed (and node heterogeneity is Evil
anyway). Similarly I want to be able to read man pages (while I'm coding
for certain) so I put them on. They drag TeX along which I don't mind
because I use it anyway for a lot of things and maybe will be doing a
build in an application directory that has a tex-based manual or paper
in it and it bugs me when a build fails because of missing resources.
So do I put geyes on? Of course not. How about e.g. gnuplot, octave,
maple (we have a site license)? I personally don't immediately need
them, but I can certainly envision SOMEBODY needing them. How about the
GSL? How about graphics development libraries (e.g. gtk)? How about
XML libraries? I can not only envision these being needed -- I need
them myself for certain builds.
The conclusion from a) and b) there is no point in being NEEDLESSLY
spare in a general purpose node not intended to be used primarily for
tightly coupled applications, where you cannot easily tell exactly what
is going to be needed and by whom. In terms of HUMAN time, it is far
more efficient to load the nodes with anything that you can REMOTELY
envision being useful to somebody doing HPC or code development with
them in an automated install that you're going to initiate and walk away
from anyway rather than retrofit packages afterwards on demand.
Network daemons are an OBVIOUS EXCEPTION to this -- network services
should ALWAYS be carefully considered, even on plain old LAN
workstations, because of both security and performance. Things like
ipchains or ipfilters tend to be "expensive" overhead on all TCP/UDP
connections, and overhead in parallel computations is anathema. So one
absolutely wants nodes to offer (in all probability) only incoming ssh
in the privileged port zone in an open network so no "firewall" is
needed, and one wants to nmap the nodes in their final configuration to
be sure that one succeeds because sure, lots of crap DOES end up being
turned on and port-accessible in a straight workstation install.
So I agree, I agree -- thin is good, thin is good. Just avoid needless
anorexia in the name of being thin -- thin to the point where it saps
your nodes' strength.
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
More information about the Beowulf