[Beowulf] hpl size problems

Wed Sep 28 16:47:38 PDT 2005

On Wed, Sep 28, 2005 at 10:42:23AM -0700, Donald Becker wrote:
> > >> Latter yes (more like GB), former no.  Trimming the fat from the SuSE 
> > >> cluster install got it from over an hour down to about 8 minutes with 
> > >> everything, per node.
> 
> This is more typical: a distribution that comes on many CD-Rs isn't going 
> to be easily stripped down to something that can loaded in a few seconds.
> A stripped-down install will take on the order of 5 minutes.

It would depend on how you define "install". We image nodes at boot with
stripped down file systems that come from standard distros in about 5-8
seconds. The actual stripped down file systems range from 27MB to
hundreds of megabytes, and using a tmpfs/NFS hybrid gigabytes of usable
space in < 100MB of ram (translates to about 30MB of line transfer at
boot).

[...]

> > I agree, guys, I agree.  My point wasn't that trimming cluster
> > configurations relative to workstation or server configurations is a bad
> > thing -- it is not, and indeed one would wish that eventually e.g. FC,
> > RHEL, Centos, Caosity etc will all have a canned "cluster configuration"
> > in their installers to join server and workstation, or that somebody
> > will put up a website with a generic "cluster node" kickstart fragment
> > containing a "reasonable" set of included groups and packages for people
> > to use as a baseline that leaves most of the crap out.
> 
> We went down this path years ago.  It doesn't take long to find the 
> problem with striping down full installations to make minimal compute node 
> installs: your guess at the minimal package set isn't correct.  You might 
> not think that you need the X Window system on compute nodes.  But your 
> MPI implementation likely requires the X libraries, and perhaps a few 
> interpreters, and the related libraries, and some extra configuration 
> tools for those, and...

Can you elaborate on why an MPI implementation would require Xlibs?

Your point is accurate though. For non binary (interpreted scripts) and
programs linking anything but the standard library lineup there is a
chance that the supporting infrastructure is not in place when the job
runs. Luckily, that is a problem that is easy enough to fix (see below). ;)

> Yes, there are a number of labor-intensive ways to rebuild and repackage 
> to break these dependencies.  But now you have a unique installation that 
> is a pain to update.  There is no synergy here -- workstation-oriented
> packages don't have the same motivations that compute cluster or server 
> people have.

No need to replicate technologies that already exist for this purpose.
Use the package manager to handle dependencies for a chroot file system.
Even wrappers like yum work in chroots very easily thus making building,
updating and evening modifying a chroot a simple task:

   # yum --installroot /vnfs/default update
   # yum --installroot /vnfs/default install/remove moooo

Works perfectly and robustly.

You are right on WRT the motivations and goal differences of a general
purpose Linux distro and a cluster environment. I have my own opinions
here, but I would like to hear other peoples ideas as to what is missing
or needs changing from the general purpose distributions to be more HPC
and/or cluster friendly.

I am a maintainer of cAos Linux and this feedback will be used to
evaluate what else can be done at our end to make a better cluster
supporting distribution of Linux. Maybe this should be a new thread. ;)

[...]

> Uhmmm, but my start-up depends on 3D-Xeyes and the Klingon fonts!
>  (..thanks for making my point.)

Haha...

How about support for Var'aq (http://www.geocities.com/connorbd/varaq/)

> > So I agree, I agree -- thin is good, thin is good.  Just avoid needless
> > anorexia in the name of being thin -- thin to the point where it saps
> > your nodes' strength.
> 
> You've got the wrong perspective: you don't build a thin compute node from 
> a fat body.  You develop a system that dynamically provisions only the 
> needed elements for the applications actually run.  That takes more than a 
> single mechanism to do correctly, but you end up with a design that has 
> many advantages.  (Sub-second provisioning, automatic update consistency, 
> no version skew, high security, simplicity...)

While there are some extreme benefits to the architecture that you
describe, it doesn't fit all cluster needs and in many cases the nodes
require either a fully burdened file system or somewhere inbetween.

For example, every one of our (near 20) clusters is configured somewhat
differently to meet the needs of the scientific applications that run on
them (including binary only programs, and various schedulers). There
can also be multiple head/frontend nodes, visualization X based headed
nodes, file system IO nodes, and even web serving nodes ;). Ability to
network boot and manage this all from 1 (redundant) point in a scalable
fashion is a big help.

-- 
Greg Kurtzer
Berkeley Lab, Linux guy