[Beowulf] best archetecture / tradeoffs

Joe Landman landman at scalableinformatics.com
Sun Aug 28 09:00:04 PDT 2005

Hi Robert:

Robert G. Brown wrote:
> Joe Landman writes:
>>> you need to seriously rethink such jobs, since actually *using* swap 
>>> is pretty much a non-fatal error condition these days.
>> I disagree with this classification (specifically the label you 
>> applied).  Using swap means IMO that you need to buy more ram for your 
>> machines.  There is no excuse to skimp on ram, as it is generally 
>> inexpensive (up to a point, just try to buy reasonably priced 4GB 
>> sticks of DDR single/dual ranked memory).
> And there is also the "dancing bear" problem.  In some very large
> problems, the amazing thing is not that it runs particularly well or
> fast, but that you can run it at all.  Some jobs are parallelized IN
> ORDER TO run something too big to fit into physical memory, and some
> task partitionings put the job itself on a single node and use the rest
> to provide some sort of extended memory to that node. 

Back when I was at SGI working on benchmarks for customers, one of the 
things we would do is to try to get the job going as wide as possible 
across the CPUs.  If you have 8MB caches, and you can spread that thing 
wide enough, and it gets run entirely out of cache ....

I seem to remember running a streams benchmark like this once for a 


> So it isn't ALWAYS a non-fatal error condition, but it should always be
> done deliberately, because if an ordinary task swaps you start getting
> that nasty old several order of magnitude slowdown...;-)

Heh... a big red warning modal window could pop up and ask "ARE YOU 


> I don't know of warewulf is quite there, but they are damned good try.
> You install any distro you like (if it is far from any beaten path
> expect to do a bit of work).

I am starting to play with the SuSE bits and Warewulf now.  I have been 
playing with it on and off for the last year or so.  I like it.  They do 
many things right.  Specifically

 > It divorces the support of the
> minimal "cluster" core from the choice of OS, 

This is goodness, and IMO the right way to do things.  With some 
distributions, getting them to support/ship stuff that they should is 
worse than pulling teeth.  This has a cascading (negative) impact on 
cluster distributions which depend critically upon the upstream linux 

> from its natural
> update/upgrade process, and so on an maximally leverages the particular
> tools (e.g.  yum) that make managing/selecting packages easy.  

I like the modular linux approach, where you have groups of packages 
where all the bits are properly dependency resolved relative to each 
other, and they are not so locked into a particular set of underlying 
bits (e.g. if you look at it from a graph perspective, each module has 
very few connections to the core).  The problem with this is that each 
distribution does things ever so slightly differently, to the net effect 
that .src.rpms (or .debs) don't necessarily always work, so the binary 
builds don't either, ...  Even with yum/apt masking this (which at a 
coarse level is what they are attempting to do), this is still annoying. 
  More importantly for us is that most of the distros build various 
important packages either very poorly, incorrectly, or simply use 
default configurations which have little value to people who need to use 
the tools.  A great example of this is how most RPM distros build perl 
module RPMs.  Lots of the really good options on these things are not 
set as the default, and it is *really* hard to go back and fix the RPMs 
as they are generated automagically by the CPAN2RPM utilities.

We have found that in many cases the only way to solve this problem is 
to ignore the distro supplied packages and build our own.  In the end we 
could package the module as a huge RPM.  That would be fine.  But the 
individual packages that make it up need to be built right to begin 
with, and yum/apt doesn't solve that issue.

> The
> clusters you end up with are close to what you'd get if you rolled your
> own on top of your own distro (diskless, yet:-) but a whole lot easier
> than roll-your-own-from-scratch.  

Roll your own got old for me in 1999.

> Agnostic is good.

I think distribution agnosticism is *required* for good cluster support 
going forward.  Sure, some folks may be able to do some good things with 
a distro-fixed cluster system, and make installation* easy (* easy on 
supported hardware that is, just try to include something not in the 
supported list for that distro, say like SATA, or Firewire).

>  Automagic agnostic
> is better (though harder -- requires a broad developer/participant
> base).  Tools written/maintained by folks that eat their own dog food is
> best.  warewulf looks like it on the generally correct track.

My only complaint in the past with warewulf has been no-local boot VNFS 
install (e.g. copy the VNFS and minimal bits to the local disk, so the 
system can boot without loading things into RAMDISK).  Some customers 
have issues with using ram for anything other than local fast memory.  I 
think this has been/is being addressed.

I agree though in that warewulf looks very much like the right way.  I 
am looking a little at onesis, but I am not sure how well supported that 
is/will be.

>   rgb
>> -- 
>> Joseph Landman, Ph.D
>> Founder and CEO
>> Scalable Informatics LLC,
>> email: landman at scalableinformatics.com
>> web  : http://www.scalableinformatics.com
>> phone: +1 734 786 8423
>> fax  : +1 734 786 8452
>> cell : +1 734 612 4615
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

More information about the Beowulf mailing list