[Beowulf] Compute Node OS on Local Disk vs. Ram Disk
Bogdan Costescu
Bogdan.Costescu at iwr.uni-heidelberg.de
Thu Oct 2 05:03:30 PDT 2008
On Wed, 1 Oct 2008, Donald Becker wrote:
> That's correct. Our model is that a "cluster" is a single system --
> and a single install.
That's the idea that I've also started with, almost 10 years ago ;-)
Not using Beo*/bproc, but NFS-root which allowed a single install in
the node "image" to be used on all nodes - although you'd probably
call this 2 system installs (the master node itself and the node
"image"). But over the course of the years I have changed my mind...
> If you are running different distributions on nodes, you discard
> many of the opportunities of running a cluster. More importantly,
> it's much more knowledge- and labor-intensive to maintain the
> cluster while guaranteeing consistency.
It indeed requires more work, however in some cases it cannot be
avoided. From my own experience: a quantum chemistry program was
distributed some 5 years ago as a binary statically compiled on RH9 or
RHEL3 (kernel 2.4 based) with MPICH included. This meant that when I
wanted to switch to running a 2.6 kernel this program could not run
anymore so some of the nodes had to be kept to an older distribution
until a newer program version could be obtained (that took about a
year); it also meant that whenever there were discussions about using
higher performance interconnects than GigE, this software's users were
insisting on buying more nodes rather than a faster interconnect. This
situation has caused both technical and administrative issues and the
possibility of running different distributions has solved all of them
easily.
Having the possibility to run several distributions side-by-side
requires spending some effort in organizing the other installed
software, normally shared through NFS or a parallel FS to the nodes.
But once you make the jump from 1 to 2, you might as well make it from
1 to many.
This leads me to observe that we have non-similar points of view: you
are a maker of a cluster-oriented distribution, trying to promote it
and its underlying ideas (which are fine ideas, no question about that
:-)), and sure that it works because it was bought and used
successfully. I, on the other hand, have to find solutions to keep the
scientists productive (whatever productive means ;-)) and to keep them
as far as possible from the system details so that they can
concentrate on their work. So it's not surprising that we come to
different conclusions - at least they sustain an interesting
discussion :-)
I would be interested to hear Mark Hahn's opinion on this, as from how
he presented himself to this list it seemed to me that he is in a very
similar position to mine: supporting a variety of users with a variety
of needs. But others should not feel left out, write your opinions as
well ;-)
> Most distributions (all the commercially interesting ones) are
> workstation-oriented
I don't really agree with this statement (looking at RHEL and SLES),
but anyone who installs a workstation-oriented distribution on a
cluster node gets what (s)he pays for :-) I have seen very recently
(identity hidden to protect the guilty ;-)) such a node "image" which
contained OpenOffice - to be fair, it was used via NFS-root so it
wasn't wasting node memory, only master disk space...
--
Bogdan Costescu
IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850
E-mail: bogdan.costescu at iwr.uni-heidelberg.de
More information about the Beowulf
mailing list