[Beowulf] New person to building a beowulf cluster

Robert G. Brown rgb at phy.duke.edu
Fri Nov 12 12:02:23 PST 2004


On Tue, 9 Nov 2004, Andrew wrote:

> Recently took on a project doing a Beowulf cluster and I have configured the
> files necessary to run MPICH-1.2.6 to run on 3 computers using Red Hat
> 7.2(following directions). I am running into a problem where I can not run
> it on my two slave nodes(yet I can run it on my master node) it will pause
> for a while and then says p4_error: Could not get host by name for host
> node0.home.net I think it might have something to do with the way I
> configured my hosts file in which the first line is the node name, second
> line is local host, and third is my master node (node0).  Anyone have any
> other suggestions or comments as to what I should check?

No, but a possibly apropos meta-remark.   Let's see:

  RH 7.2 <- your version
  RH 7.3
  RH 8
  RH 9
  (say) FC 1
  FC 2 <- current stable
  FC 3 <- current new

You are years and years out of sync with current linux distros (with
similar divergences between 7.2 and other flavors of linux distro).

In the meantime the kernel has radically changed, glibc has radically
changed, the compiler(s) have radically changed, the support libraries
have radically changed, and the general user interface has radically
changed.  That's a lot of radical change.

To even get good help from this list you'll likely need to upgrade to
something more current, as otherwise nobody will be able to tell if your
problem is in MPICH as distributed in 7.2 (unless they can remember that
far back), MPICH as currently distributed but BUILT on 7.2 (old library
bugs or incompatibilities), the compiler, the kernel, the networking
stack...

True, it kind of looks like your problem is likely to be at the
administrative level -- having the correct format for /etc/hosts.  It
should look something like:

# /etc/hosts for rgb's private home network
#
# This is required for loopback access to localhost
127.0.0.1	localhost	localhost.localdomain
#
# The inside/server/gateway/firewall address of Eden
192.168.1.1	node1.private.net	node1
192.168.1.2	node2.private.net	node2
192.168.1.3	node3.private.net	node3
...

When it is correctly set up, you should be able to ping hosts by name
as:

  ping node1.private.net

or

  ping node1

(via the alias defined as the third entry in /etc/hosts).  This hosts
file should likely be on all nodes, not just the head node.  You many
also need to check /etc/nsswitch.conf, to make sure "files" is listed as
an entry for "hosts:".

Hope this helps...

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list