[Beowulf] hpl size problems

Joe Landman landman at scalableinformatics.com
Thu Sep 29 04:40:21 PDT 2005



Robert G. Brown wrote:
> laytonjb at charter.net writes:
> 
>>
>>
>>> I keep forgetting that RGB is actually a cluster of text generation 
>>> 'bots able to crank out more characters per second than most folks 
>>> can speak ... ;)
>>
>> Priceless...
> 
> Harummph.  I'm just working on trying to fool a committee into thinking
> I've got a cluster that can pass the Turing test... isn't there a prize
> of some sort for that?

More grants <ducking />

[...]

>  b) There are a lot of things that can be USEFUL on general purpose
> cluster nodes.  I always put editors on them, for example, and
> programming tools and compilers, because every now and then I'm logged
> into one and want to work on code.

The view that I tend to espouse is

	[core stuff          ] --> installed  (should be really small)
	[application stuff] --> mounted (make it any size you require)

I came to this conclusion rather quickly after being asked to supply 
bioperl and a few others that are not easily craftable into RPMs across 
a cluster.  Part of what Mark Hahn does (and please correct me if I am 
wrong) is he manages large clusters via diskless systems.  It reduces 
his admin time.  We are using an intermediate view as we don't want to 
rebuild/maintain kernels and initrds, in that we are using a local 
installation a small footprint OS (to avoid long load times, and allow 
the user to use the OS of their choice, heck we can run autoyast, 
kickstart, apt, yum, whatever... from our installation environment.  We 
need to do this due to the requirements of the previous mentioned in 
this thread, closed source application vendors, whom have qualified on 
precisely one version of the OS, and sadly are in the "tell us if it 
works" mode for either diskless or ramdisk based OSes.

By installing the applications to a common tree, and mounting the common 
tree across the nodes (read only at that), we can solve many problems 
quickly.  If the cluster grows to a large size, we can even set up 
multiple servers and load balance the mounts.  I think the Onesis 
cluster system does something like this.  Nicely, this setup also works 
very well with Warewulf, and with a little bit of effort, with Rocks and 
Oscar.

Note:  I can and will say nice things about Rocks to people who ask BTW. 
  It just doesn't work for all clusters we build for our customers, and 
this is due to application requirements on the OS, or hardware/software 
requirements that Redhat is quite slow to support (SATA, IB, xfs as a 
subset of examples, we are installing a cluster now that has an 
application that requires SuSE).  No one system is the 
be-all-and-end-all cluster system.  You need to use what you can manage 
and run with.

  In our environment, the only
> x86_64's we have (until today, at any rate, when I put one on my desk:-)
> are in clusters, so if I want to do a build it has to be on a cluster
> node.  I COULD keep just one node suitably equipped for a build, but
> then if that node goes down I'm screwed (and node heterogeneity is Evil
> anyway). Similarly I want to be able to read man pages (while I'm coding
> for certain) so I put them on.  They drag TeX along which I don't mind
> because I use it anyway for a lot of things and maybe will be doing a
> build in an application directory that has a tex-based manual or paper
> in it and it bugs me when a build fails because of missing resources.
> So do I put geyes on?  Of course not.  How about e.g. gnuplot, octave,
> maple (we have a site license)?  I personally don't immediately need
> them, but I can certainly envision SOMEBODY needing them.  How about the
> GSL?  How about graphics development libraries (e.g. gtk)?  How about
> XML libraries?  I can not only envision these being needed -- I need
> them myself for certain builds.

This is why putting applications and their dependencies into a mount 
makes life a little easier.  It is not hard in a number of cases to 
relocate RPMs at install time to install to the /apps top level tree 
rather than the /usr top level tree, and it is really not hard to append 
/apps/lib and /apps/lib64 to the /etc/ld.so.conf file and re-run the 
ldconfig.  All thats left after that is setting the path, or using an 
explicit path, and we need to leave something as an exercise to the reader.

> The conclusion from a) and b) there is no point in being NEEDLESSLY
> spare in a general purpose node not intended to be used primarily for
> tightly coupled applications, where you cannot easily tell exactly what
> is going to be needed and by whom.  In terms of HUMAN time, it is far
> more efficient to load the nodes with anything that you can REMOTELY
> envision being useful to somebody doing HPC or code development with
> them in an automated install that you're going to initiate and walk away
> from anyway rather than retrofit packages afterwards on demand.

No one is being needlessly spare.  Just want the minimum OS footprint on 
the node as possible, and mount all the applications from a server. 
This way, if you really want to run Maple across your cluster, the 
installation is really easy.  Any application you want can be placed in 
/apps, and mount it.  Or if absolutely needed, mount /usr from the 
network.  If you are doing that, you should just finish the transition 
and go diskless/ramdisk based.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615



More information about the Beowulf mailing list