clustering both linux and unix..

Robert G. Brown rgb at phy.duke.edu
Tue Oct 16 11:17:25 PDT 2001


On Tue, 16 Oct 2001, Luc Vereecken wrote:

> Most Unixes are "immensely stable", not just Linux. In my experience, there
> are no added instability problems in a heterogenous cluster compared to a
> homogenous cluster, one you have managed to get your programs working on
> all OSs involved.

This is sort of like saying that there are no additional problems once
you've solved the additional problems. The point is that overall admin
and applications development effort scales at least linearly --
different packagings, different maintenance at the OS level, different
include files and different libraries at the application programming
level (although e.g. POSIX compliance has to some extent ameliorated the
latter).  Also, most of the alternative Unices (with the exception of
FreeBSD) are not open source, as well, which adds its own layers of
difficulty and instability which have been discussed at length on the
list.  Open source doesn't mean working and functional, but it at least
gives you a fighting chance at fixing some of the stuff that doesn't
work (as work by e.g. Josip Loncaric and others has clearly demonstrated
in this venue).

It is also undeniable that one's risk of jobkilling errors is some
combinatorial factor higher when running on several OS's rather than
just one (in fact, this is just a restatement of the previous
observation -- if you spend less than Nx the effort on average, you run
greater risks, on average on one of the OS's).  Again, the list has seen
reports over the years of many problems that affect one particular
kernel subsystem (such as the TCP stack) in one particular kernel
flavor, sometimes in just one parallel library.  Those problems can
sometimes be very time consuming to solve (and may require access to the
kernel sources to even identify).

Your points are well taken, though -- Unix in general is quite mature as
an OS paradigm and its current surviving implementations are necessarily
pretty highly evolved.  In some environments, one "has" to run e.g.
Solaris and AIX and Irix and Digital Unix (etc) because that's the way
that it is, and one can (as I said) run cluster programs across the lot
of them.

Still, if one >>can<< reduce the number of OS's supported in any given
organization, one almost always realizes economies of scale and sees
improved scaling and stability.  One person can easily run an extremely
large linux-only network.  If one person CAN easily run an extremely
large AIX+Linux+Irix+DU+Solaris+... network, my hat is off to them!
They are clearly in the Unix Super Genius category of human -- I've
never managed more than 3 Unixoid OS's at once, and one of those was
pretty poorly run to be frank.  Nowadays I would not willingly handle
more than one...;-)

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list