David Bussenschutt d.bussenschutt at
Sun Oct 7 21:45:13 PDT 2001

Hi all again.... you all seem to be interested in making sure that :
1) if a node is down, then it gets updated when it comes up.
2) if all nodes come up at once, then you don't want the network/master 
flooded with more requests.

Possible solutions and issues:
1) make clients "pull" files on boot (suggested on list)
        * must add random delays to the pull so server is not overloaded. 
        * must have localised scripts on client machines that perform 
(what do you do when you want to update these scripts because the pull 
isn't working properly? - you can't have them "pull" the new version!)
2) have server push when client is up again.
        * server is never loaded because it ititiates actions (and my perl 
script is currently not multi-stranded)
        * all management changes are in one place.

Also I had someone as the following questions:
how do I force an update on all nodes?
        * touch the file on the server that I want to update to nodes, and 
let my perl daemon push it out.
how do I force an update on one node?
        * I don't, I just do all nodes.   I'm sure the script could be 
improved if this was required.

I have updated my perl program (below) so it handles the cases were a a 
node/client goes down polls for the client periodically (waiting for 
it to come back up)  - the frequency of which is settable at start time 
(or defaults to a reasonable value)

at start time, I now just run:
syncfiles node1
syncfiles node2
syncfiles node3 


here is the 'syncfiles' program:

#!/usr/bin/perl -wT
# syncfiles
# usage: in a rc.sysinit or inittab on the master host, run:
#      syncfiles clienthost [checkdelay] [retrydelay]
# Designed to automatically check every 'checkdelay' seconds whether 
certain files have 
# been modified and if modifications have been made, then push them to the 

# requested client host over a ssh connection. This script runs as a 
daemon by default.
# Requires: 
#       1) to run as root for /etc/files, so you can access the shadow 
#       2) rsync and ssh must be available, and at the paths defined in 
this script
#       3) root ssh access without a password to the client (ie 
.ssh/authorised_keys2 on remote hosts)
# written by David Bussenschutt Oct 5 2001 - free for everyone  - no 
responsibily accepted.
# It could be made bigger and better, but I like the KISS principle.
# October 8 2001 improvement:
# The script will now retry the send if the client is not available...
# If client host is uncontactable (or other ssh connection problems), then 
the delay is changed 
# to the (hopefully longer) 'retrydelay' until the host is available again 
in order that the network
# isn't flooded with retrying/failing requests the whole time..

use strict; # for syntax checking
use POSIX; # for 'setsid'

# files to update, and the initial ctime to use
my %files = ('/etc/passwd' => '0', '/etc/shadow' => '0', '/etc/group' => 

# where are the rsync and ssh commands?
# rsync should also be in the same place on the remote hosts as on the 
primary host
# the path to these is hardcoded because this script runs as root
my $rsync = '/usr/bin/rsync';
my $ssh = '/usr/bin/ssh';

# client name should be provided on command line, if it's not, the script 
won't run.
my $client = $ARGV[0];

# how often do we check for changes to the ctime of the files? 
# should be provided as second command line option, or default to 10 
my $checkdelay = $ARGV[1]||10;
# if we couldn't contact the host or had errors, how long till we retry? 
from cmd line or default to 60.
my $retrydelay = $ARGV[2]||60;

# dissassociate from terminal (1=yes)?
my $DAEMON=1;

my $DEBUG=1; #(1=more output)
print "Running in DEBUG mode - modify script to turn off DEBUG and silence 
this output.\n" if $DEBUG;

#------------ERROR CHECKING BEGIN---------------
die "rsync not found\n" unless defined $rsync;
die "remote host not defined on command line\n" unless defined $client; 
chomp $rsync;
# untaint the client name
if ($client =~ /^([-\w.]+)$/) {      # alphanumerics,hyphens and dots 
   $client = $1;                     # now untainted
} else { print "Really Bad data in client hostname. Only alphanumerics, 
hyphens and dots allowed.\n"; die; }
# untaint the path
$ENV{'PATH'} = '';
#------------ERROR CHECKING END---------------

#--------DAEMON CODE BEGIN-------------------------
# are we going to run as a daemon?
my $pid;
if ($DAEMON == 1){
  # only INT, TERM and HUP are REALLY needed.
  #ignore $SIG{PIPE} as it's dangerous 
  # turn stdio output off
  print "Dissosociating from terminal and running as a daemon\n" if 
  # virtualise / to be in a 'safe' location
  #chroot("/var") or die " Couldn't chroot to /var: $!";
  # fork a child and let the parent exit.
  exit if $pid;
  die "Couldn't fork new process: $!" unless defined ($pid);
  # dissociate the process from the terminal and don't be part of
  # my old process 'group'
  POSIX::setsid() or die "Can't start a new session: $!";
sub signal_handler {
  die "syncfiles: dying on signal\n";
#---------DAEMON CODE END------------------------

#----------------MAIN LOOP BEGIN-----------------------
my $delay = $checkdelay;
while (1) {
  sleep $delay;
  foreach my $file (keys %files){
    #if ctime of file is > than ctime in $ctimes then 
    #   do push., 
    #   update ctime into %files
    my $newctime;
    if (defined ( $newctime = (stat($file))[10] ) and $newctime > 
$files{$file}) {
      my $retval = system ("$rsync -ae '$ssh -x' --rsync-path='$rsync' 
$file root\@$client:$file");
      print "$rsync -ae '$ssh -x' --rsync-path='$rsync' $file 
root\@$client:$file" if $DEBUG;
      if ($retval==0) {
        $files{$file} = $newctime; # if system call returned ok, define a 
new 'last-updated' ctime.
        $delay = $checkdelay;
        print "-- done ok\n" if $DEBUG;
      } else {
        $delay = $retrydelay;  # if there was an error, then wait for a 
longer period before trying again.
        print "-- error occured\n" if $DEBUG;
    } elsif (!defined $newctime or $newctime<=1) {
      # file that should exist does not exist or other error
      die "error getting ctime of $file\n"; 
    } # else no update of this file needed. 
#----------------MAIN LOOP END-----------------------

David Bussenschutt          Email: D.Bussenschutt at
Senior Computing Support Officer & Systems Administrator/Programmer
Location: Griffith University. Information Technology Services
           Brisbane Qld. Aust.  (TEN bldg. rm 1.33) Ph: (07)38757079

Steven Timm <timm at>
10/05/01 11:28 PM

        To:     David Bussenschutt <d.bussenschutt at>
        cc:     beolist <beowulf at>
        Subject:        Re: NIS?

The rsync script is a good idea and something we are thinking
of implementing--only problem do you handle the
situation when a node happens to be down during a push?


Steven C. Timm (630) 840-8525  timm at
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations

On Fri, 5 Oct 2001, David Bussenschutt wrote:

> Slight side-bar here, but I think it relates:
> My chain of thought:
> 1) everyone agrees NIS works (even if it is arguable about the speed,
> reliability, security etc)
> 2) everyone agrees that it can/cause have problems in some situations -
> especially beowulf speed related ones.
> 3) the speed has to do with the synchronisation delays inherent in a
> bidirectional on-the-fly network daemon approach like NIS
> 4) many people prefer the files approach for speed/simplicity (ie to 
> problems in 3).
> 5) In a beowulf cluster, passwords shouldn't be changed on nodes, so a
> server push password system is all that's required -hence the files
> approach in 4).
> 6) why not have the best of both worlds?   What we need is a little 
> on the server that pushes the passwd/shadow/group/etc files to the 
> over a ssh link whenever the respective file is modified on the server.
> 7) How I suggest implementing this:
> The nieve/simple approach:
> set up the client so that root can ssh to them without a password (I
> suggest a ~/.ssh/authorisedkeys2 file amd a ~/.ssh/known_hosts2 file)
> root crontab entries that run the following commands periodically (as
> often as you require - depending on how much password latency you can 
> with)
> # first client
> /usr/bin/rsync -ae 'ssh -x' --rsync-path='/usr/bin' /etc/passwd
> root at client1
> /usr/bin/rsync -ae 'ssh -x' --rsync-path='/usr/bin' /etc/shadow
> root at client1
> /usr/bin/rsync -ae 'ssh -x' --rsync-path='/usr/bin' /etc/group
> root at client1
> # second client
> /usr/bin/rsync -ae 'ssh -x' --rsync-path='/usr/bin' /etc/passwd
> root at client2
> /usr/bin/rsync -ae 'ssh -x' --rsync-path='/usr/bin' /etc/shadow
> root at client2
> /usr/bin/rsync -ae 'ssh -x' --rsync-path='/usr/bin' /etc/group
> root at client2
> # etc
> The improved aproach (a perl program i just wrote - tell me what u 
> ):
> --------------------------------------------------------------------
> David Bussenschutt          Email: D.Bussenschutt at
> Senior Computing Support Officer & Systems Administrator/Programmer
> Location: Griffith University. Information Technology Services
>            Brisbane Qld. Aust.  (TEN bldg. rm 1.33) Ph: (07)38757079
> --------------------------------------------------------------------
> Donald Becker <becker at>
> Sent by: beowulf-admin at
> 10/05/01 10:32 AM
>         To:     Tim Carlson <tim.carlson at>
>         cc:     Greg Lindahl <lindahl at>, beolist
> <beowulf at>
>         Subject:        Re: NIS?
> On Thu, 4 Oct 2001, Tim Carlson wrote:
> > On Thu, 4 Oct 2001, Greg Lindahl wrote:
> >
> > > BTW, by slaves, do you mean "slave servers" or "clients"? There's a
> > > big difference. Having lots of slave servers means a push takes a
> > > while, but queries are uniformly fast.
> >
> > I meant clients.
> > 1 master, 50 clients.
> > The environment on the Sun side wasn't a cluster. 50 desktops.
> Completely different cases.
>  Workstation clients send a few requests to the NIS server at random
> times.
>  Cluster nodes will send a bunch of queries simultaneously.
> > Never had complaints about authentication delays. I just haven't seen
> > these huge NIS problems that everybody complains about.
> The problems are not failures, just dropped and delayed responses.  A
> user might not notice an occasional ten second delay.  When even trivial
> cluster jobs took ten seconds, you'll notice.
> > If you were running
> > 1000 small jobs in a couple of minutes I could imagine having problems
> > authenticating against any non-local mechanism.
> Hmmm, a reasonable goal is running a small cluster-wide job every
> second.  I suspect the NIS delays alone take longer than one second with
> just a few nodes.
> > Our current cluster builds use for clustering
> > software. This system uses NIS.  I know it is odd to hear of any other
> > system than Scyld on this list,  but we have had good luck with NPACI
> > Rocks.
> We don't discourage discussions about other _Beowulf_ systems on this
> list.  We have thought extensively about the technical challenges
> building and running clusters, and are more than willing to share our
> experiences and solutions.
> Donald Becker becker at
> Scyld Computing Corporation                    
> 410 Severn Ave. Suite 210                                Second 
> Beowulf Clusters
> Annapolis MD 21403 410-990-9993
> _______________________________________________
> Beowulf mailing list, Beowulf at
> To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list