Robert G. Brown
rgb at phy.duke.edu
Tue May 7 11:32:01 PDT 2002
On Tue, 7 May 2002, Timothy W. Moore wrote:
> Is it important to use the host node to act as the time server for the
> slave nodes? I do not know that this is the problem, however, I am
> having problems reading files generated by the slaves on the host node.
> I saw a reference to a similar problem in the digest but deleted it due
> to my infinite wisdom. Could it be due to files being written at times
> which the host does not like? If the case, can someone point me to a
> HOWTO...I have one, but it seems that the documents which I have been
> using are a little out of date. Also, would the person who responded
> about investigating nfs files systems respond again?
I personally think that networked systems, nodes or not, should have the
time network synchronized if at all possible. Onboard PC clocks plus
the kernel's timekeeping are notoriously inaccurate (for no good reason
that I can think of) -- we've seen systems that would drift minutes per
day. There are programs (make being an obvious and well-known example)
that get "unhappy" acting on files with modification dates in the
future. A poor clock also means that you won't be able to rely on dates
or times in your systems logs, which can seriously impact an attempt to
figure out e.g. a security problem or a time-sensitive systems problem
(Now let's see, DID that crash occur when Joe started his big program
last night? Wish I could tell...:-). I consider it important enough to
devote part of a monitor panel to system time in wulfstat/xmlsysd -- a
system that shows up with clock drift indicates that at least ntpd
failed to start correctly or crashed, which often indicates that there
may be other problems.
We use ntpd to keep everything sync'd. The campus has its own
ntp timeserver, and it is fairly easy to set up a timeserver of your own
that slaves to a timeserver at a higher stratum. There are other tools
that one can use as well -- see man rdate, for example -- provided you
set up the appropriate timeserver. If you are using Scyld (not clear
from your note) you may have to use what they provide for this purpose
or the scyld kernel may be SUPPOSED to take care of this for you, I'm
Regarding NFS filesystems, I'm not sure which thread you mean. There
were four or five people who responded to the very recent one involving
NFS-based file corruption with lots of client programs "directly"
writing to a single database file. Some suggested increasing the number
of nfsd's, I suggested using some form of file locking. Look in the
list archives, sorted by thread, for the last week (or more) and you
should be able to find it pretty easily.
Hope some of this helps.
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf