[Beowulf] New person to building a beowulf cluster
Robert G. Brown
rgb at phy.duke.edu
Fri Nov 12 12:02:23 PST 2004
On Tue, 9 Nov 2004, Andrew wrote:
> Recently took on a project doing a Beowulf cluster and I have configured the
> files necessary to run MPICH-1.2.6 to run on 3 computers using Red Hat
> 7.2(following directions). I am running into a problem where I can not run
> it on my two slave nodes(yet I can run it on my master node) it will pause
> for a while and then says p4_error: Could not get host by name for host
> node0.home.net I think it might have something to do with the way I
> configured my hosts file in which the first line is the node name, second
> line is local host, and third is my master node (node0). Anyone have any
> other suggestions or comments as to what I should check?
No, but a possibly apropos meta-remark. Let's see:
RH 7.2 <- your version
RH 7.3
RH 8
RH 9
(say) FC 1
FC 2 <- current stable
FC 3 <- current new
You are years and years out of sync with current linux distros (with
similar divergences between 7.2 and other flavors of linux distro).
In the meantime the kernel has radically changed, glibc has radically
changed, the compiler(s) have radically changed, the support libraries
have radically changed, and the general user interface has radically
changed. That's a lot of radical change.
To even get good help from this list you'll likely need to upgrade to
something more current, as otherwise nobody will be able to tell if your
problem is in MPICH as distributed in 7.2 (unless they can remember that
far back), MPICH as currently distributed but BUILT on 7.2 (old library
bugs or incompatibilities), the compiler, the kernel, the networking
stack...
True, it kind of looks like your problem is likely to be at the
administrative level -- having the correct format for /etc/hosts. It
should look something like:
# /etc/hosts for rgb's private home network
#
# This is required for loopback access to localhost
127.0.0.1 localhost localhost.localdomain
#
# The inside/server/gateway/firewall address of Eden
192.168.1.1 node1.private.net node1
192.168.1.2 node2.private.net node2
192.168.1.3 node3.private.net node3
...
When it is correctly set up, you should be able to ping hosts by name
as:
ping node1.private.net
or
ping node1
(via the alias defined as the third entry in /etc/hosts). This hosts
file should likely be on all nodes, not just the head node. You many
also need to check /etc/nsswitch.conf, to make sure "files" is listed as
an entry for "hosts:".
Hope this helps...
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list