RH7.1,portmap, yp,nfs (again)
Georgia Southern Beowulf Cluster Project
gscluster at hotmail.com
Thu Jun 21 15:01:14 PDT 2001
Hello,
I've sent out a previous message asking if any is having portmap problems
with RH7.1. Well, my machines just broke again today and I'm quite upset
about it. All of my machines are running a kickstarted install of RH7.1,
firewalling turned off, yp for passwords and the auto.home files (and a few
others), NFS and AutoFS to mount user directories and other working
directories. The kernel is 2.4.2 and I've not recompiled it from the
default redhat kernel. All machines are PII 350 or better. I have four
clusters configured in a tree pattern where each cluster of 16 computes is
one server and fifteen nodes. Each server is tied to other servers on a
network, which also includes a sort of master server for the YP services,
but it is not used for actual computing. Here is the problem.
Saturday: Made a fresh kickstart install of all 64 servers and nodes using
RedHat 7.1 and NFS exporting the kickstart stuff. Everything is cool. I
can immediatley lamboot and run one of the example programs (the cpi
program) on each cluster individually. NIS works fine. AutoFS works fine.
Monday: Everything goes to hell. When I add users or change NIS maps and
then "cd /var/yp; make" every map tells me something about RPC portmap
failure or can't export maps. ypserv and ypbind are running fine on the
master server (where these events take place). When I run ypinit -s
<server> on each of my YP slave servers, they say that they cannot enumerate
maps, make sure it is running on <server>. It is, I've quadruple checked
it, even putting the master ypserv process in debug mode. Also, going to
single user mode and then coming back to console runlevel shows me that
ypbind now cannot bind to the domain server, which IS running ypserv. Also,
even when I can bind to the master server on all the other servers, the
nodes are not receiving the yp maps from their slave servers. Also, NFS
stuff goes haywire, complaining about RPC, portmap, and not being able to
register or get slots. Solution: go home, eat dinner, nap, come back
refreshed. When I come back a few hours later, EVERYTHING WORKS. Its like
nothing happened. I an again lamboot and run the pi program on 64 machines.
Tuesday: Everything works. New users are being added by the hour and I can
lamboot and mpirun programs on all clusters.
Wednesday: Out sick and unable to monitor system, but I get no complaints
from users.
Thursday: Everything is gone to hell again. I first notice it when I add a
new user and the "cd /var/yp; make" again complains about RPC and portmap.
Again, the slave servers actually get the maps (though the master server is
complaining that they do not), but the nodes bound to these servers are not
getting anything. They complain about YP_DOMAINNOTBOUD, but it definitely
is.
Now, I'm at a complete loss for what to do next. I'm trying different
combinations of options in the yp.conf and ypserv.conf files and trying to
take machines to single user mode and only bringing network, portmap,
ypserv, and ypbind up. I'm only getting error messages, but the config
files seem proper. My guess is that every couple days portmap is croaking,
but how do I get real proof of this? I'm also not seeing anything in the
system logs that gives any help, except a news program activates everyday.
I don't think it should interfere with portmap/YP/NFS. Any ideas?
Thanks,
Wes Wells
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com
More information about the Beowulf
mailing list