[Beowulf] Announcing nettee 0.1.4

David Mathog mathog at mendel.bio.caltech.edu
Tue May 3 15:49:30 PDT 2005


> Can you elaborate on the features that you have added to Dolly?

Hmm, well, I sent some changes for dolly to Felix and then
the most recent version of that was mutated into nettee.

More or less the changes dolly -> nettee were:


Added:
  1.  Ability to pass short text messages rapidly (allows a
        "poor man's" parallel command executor without having to
        install MPI or PVM.  Also allows command line control
        of single remote nodes without rsh or ssh.  See the
        pdist_shell.sh example.  Generally these commands are
        run blind, no stdout or stderr comes back.)
  2.  Command line control (with the goal of having complete
        control via DHCP, see the README.TXT section about
        systemimager. With nettee it should be possible to
        write scripts that rewrite /etc/dhcpd.conf to
        automagically configure the data distribution path(s)
        that will be seen when nodes reboot.)
  3.  Verbosity levels
  4.  Run command processing input or output via a Socket
      (in some cases faster than through a pipe, YMMV)
  5.  Large file support (that may have been put into dolly now
      too).
  6.  -w (keep trying to connect if the next node isn't booted
      yet.)
  7.  -p (set the port, I think maybe dolly could do this in
      the config file).
  8.  piped input/output (first added to dolly, carried over
      to nettee.)

Removed
  1.  Top down control, other than shutting down the whole chain
      on an error (there is no config file as in Dolly).
  2.  Backflow (except for the final byte count, which is used
      to verify that data sent = data received.)
  3.  Segmented file support (not needed with large file support).
  4.  A few other things I never used in dolly but can't
      remember now.

There is one known bug (shared with dolly): the gethostbyname()
call fails when a binary prepared on a modern linux (2.6.x) is
run in a tiny linux (2.2.x).  My workaround is to specify
the -next parameter as the IP address, using parsing of the
host table (for instance) or it could be passed by dhcp.
Felix's workaround was to use a statically linked version of dolly.


> 
> I worked on a project recently to handle system provisioning of +1000
cluster 
> nodes with 10GB file systems in under an hour. We used dolly in
conjunction 
> with the Warewulf cluster manager utilizing 16-32 node segments for
the dolly 
> ring and each ring having a "group lead"'s which pulls direct from the 
> Warewulf's VNFS (virtual node file system) daemon and then dolly's to the
> other members of the group.


nettee can also send the datastream to multiple subnets,
if your network topology makes that worth doing, like this:

  nettee -next node1,node2,node3,node4

where the nodes listed are on different ethernet interfaces.
Since nettee runs on the same 2 ports on each node you
shouldn't be able to wire it into loops.  If that did happen
the data would go merrily around forever or until some disk
filled up and the local write failed.

Both nettee and dolly put a really heavy load on the switch
and are of course completely the wrong solution for a shared
network (if anybody still has those).

> 
> As this was just a proposal, we didn't have the resources
> to scale to 1000 nodes, but based on our tests it is
> theoretically very favorable that we can
> do it in substantially under 1 hour. Dolly rocks (Felix++)! :)

I agree, dolly rocks.  nettee started as dolly with a different
command interface and grew (and shrank) from there. dolly
occasionally hiccoughed when I ran it though, apparently eating
a few bytes off the last buffer.  At first nettee did that too
until the transmit() loop was simplified and the backflow part
of the select() eliminated.  I have not seen nettee drop
data yet.  Knock wood. Dolly had a sort of handshake to signal
when data was fully read on the top node.  Nettee doesn't - when
it's done reading it does a shutdown() on the output socket and
that is the signal for the next node that there's no more
incoming data.

I've only got 20 nodes.  I have no idea how this daisychain works
when the chain is 100 or more nodes long.

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list