Question: Task Farm and Private Networks.

Thu May 31 07:51:12 PDT 2001

bOn Wed, 30 May 2001, Hoeffel, Thomas wrote:

> Hi,
>
> I currently have a small cluster in which the slave nodes are on a private
> network. It is used primarily as a task farm and not as a true parallel
> machine. Only the master node sees our other systems  (which are on their
> own switch). This casues problems with certain remote job submissions via
> some commercial packages since they write both local temp files and scratch
> temp files.
>
> Question: What is the drawback to giving each slave it's own true IP address
> and allowing them to NFS mount the same file systems as the master node?

In a real "compute farm" (where the tasks are embarrassingly parallel
and don't communicate) none that I can think of.  Indeed, it is the only
sane way to go.

There are many kinds of clusters, only a few of which are true
"beowulfs" in the narrow sense of the definition of the architecture.
For the task mix you describe (lots of embarrassingly parallel work run
as separate jobs on the various "nodes") there is very little benefit to
using a true beowulf architecture and plenty of additional costs in the
form of scripting solutions to problems that arise due to a lack of a
shared filesystem and so forth.  Yes, recent list discussion has shown
that you "can" use a scyld beowulf as a compute farm; it has also shown
that it is a bit clumsy and difficult to do so, so why bother?

It should be very easy to flatten your network -- either connect the
inner switch to the outer switch (rationalizing e.g. the IP space and
routing and all that) or arrange for the master node to act as a router
and pass the NFS mounts through it.  The In most cases I think the former
makes more sense; in a few (mostly when the master is idle enough that
the overhead of its acting as a router isn't "expensive" in terms of
time to complete work) the latter might.

Pop a more or less standard linux on each node (remembering that the
nodes are now openly accessible and hence need to be configured with
probably only sshd open as a means of access to minimize security
hassles).  You can strip the node configuration a bit -- if they are
headless they probably don't need X servers, for example, and can likely
live without games, KDE and/or Gnome desktops and tools, mail, news, web
browsers, and the like.  If they have big disks, though, there isn't
much point in stripping the configuration a lot -- heterogeneity in a
network costs more money in time than extra space costs in disk.

Users can then login to each node and run jobs, or a remote job
submission package can do it for them or you can install MOSIX on the
nodes and let them login to a single node to run jobs and let MOSIX
migrate them around to balance load.  You may still want a tool like
procstatd to monitor load on the cluster, especially if users are
logging into nodes to run their jobs -- it can easily reveal which nodes
are idle and ready for more work.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu