[Beowulf] What services do you run on your cluster nodes?

Tue Sep 23 06:18:23 PDT 2008

On Tue, 23 Sep 2008, Joe Landman wrote:

> Robert G. Brown wrote:
>
>> One can always run xmlsysd instead, which is a very lightweight
>> on-demand information service.  It costs you, basically, a socket, and
>> you can poll the nodes to get their current runstate every five seconds,
>> every thirty seconds, every minute, every five minutes.  Pick a
>> granularity that drops its impact on a running computation to a level
>> you consider tolerable, while still providing you with node-level state
>> information when you need it.
>> 
>> Just a thought...;-)
>
> :)
>
> I really have to look at this already.  If the poll could be rigged so that 
> everyone gets polled at the same time, and it is not too frequent (1/minute), 
> this could be quite helpful.  Especially if it doesn't have to run a 
> process/fork a thread to get state info.  Ganglia is nice, but it has some 
> overhead.  And occasionally gmond wanders off into a different universe ...
>
> If you have the pointer handy to the tool (save me a google, and get you free 
> advertising :) ) ...

But of course:

   http://www.phy.duke.edu/~rgb/Beowulf/wulfware.php

Some people really like it.  Others don't, I guess.  I still use it
myself quite a lot, because it kicks butt for doing a certain kind of
almost-real-time monitoring for straight client/server LANs as well
(real time but not instant).  I still haven't built a proper GUI for it,
but there is a simple web interface (you can set it up to rebuild a web
page every minute, say, and then anybody can open their browser onto the
table it creates), there is an xterm/tty/ncurses interface that "works"
for at least modest cluster sizes (one has only 20-30 rows per xterm, so
there is some paging required to see everything ditto web interface
actually) and there is a "logger" interface that just dumps cluster info
out to stdout in a simple table format so that you can pipe it into a
file or files or perl script and do what you wish with it (like monitor
a network for 24 hours across some sort of regularly occurring "problem"
with a 5 second granularity to try to identify the correlates as the
first step in solving the problem, which is what I'm doing with it
today:-).

As far as load is concerned:  You can run xmlsysd as either an xinetd
process or forking daemon (the former is more secure, perhaps, the
latter makes it stand alone and keeps one from having to run xinetd:-).
It costs you one fork to run the initial daemon in the latter case, and
a fork per connection BUT the connections are persistent TCP connections
and hang out indefinitely.

When it initializes -- it is controlled entirely from the client side --
it open fp's into all the e.g. parts of /proc it needs.  It then uses
rewind, not close/open, to avoid the overhead of fopen/fclose, polling
the contents of those files on (client side) demand.

It is trivial to operate and play with.  Drop it onto a client (which
can be your personal laptop.  Run it out of xinetd or just run an
instance of it in forking daemon mode from a command line in userspace
(it comes up on default port 7887).  Then telnet to the host on port
7887 (user configurable, of course).  When you connect, enter "init".
Then enter "send".  Wowsers!  A packet full o' facts!

xmlsysd is moderately controllable so that a lot of this output can be
suppressed or selected.  As it stands (watching everything) it is more
than one packet (but only one TCP message, so the latency is mostly
already paid).  If one watches (say) only load average, or only
/proc/stat, or only networking or memory or cpuinfo (at a time) the
return will usually easily fit in a single packet.

Also, while the libxml default is to pretty-print, 2K of the 6+K default
return is whitespace, and if you enter "off whitespace" xmlsysd will
remove the whitespace from the xml making the result less human readable
but just as parseable.  libxml is SUPPOSED to be able to compress as
well, but I've never gotten that to work -- compression of the highly
redundant information would probably reduce the message size to at most
1K at the cost of a bit more CPU on both ends, allowing one to trade off
server load against network load.

All this to defend the decision to use xml -- it makes it parsable with
commonly available tools, is easy to read, enforces a hierarchical view
of the data, is simple to debug and extend without breaking tools that
already can read it, and without question makes the final message larger
than it needs to be from a stricly information theoretic point of view.
I think it is worth it -- xmlsysd's precursor was still ascii (and hence
readable) but wasn't xml and every time the kernel changed one effectly
broke the API to accommodate.  Now if I need to add a field, no problem.
XML tools generally ignore tages they don't recognize, so as long as I
don't have major hierarchical flaws to fix that require alteration of
e.g. tag nesting nothing breaks.

Seriously, once you have xmlsysd set up to run on a cluster of REGULAR
servers and clients, let alone cluster nodes, and just run "wulfstat" to
take a quick peek at them to try to understand why the network is so
slow just one time, you'll see why I keep it around even for my own use
only.  It is ALMOST like running top or vmstat on all the hosts at once
and having the results from the top header and the 2-3 processes
actually on the CPU, per host, in a single at-a-glance table.  Server
networked being wedged by client X?  There it is.  Memory leak causing
problems?  As long as it hasn't proceeded to where xmlsysd itself starts
to hang, swap, memory and even running process info like run and virtual
size are a key or two away.  Can't remember what the CPU is in box Y?
Type a key, there it is, and oh yeah, looks like the clock of host Z is
way out of sync with everybody else, what's up with that.

Right now (at this instant, in real time) it is telling me that a
network backup is causing problems on my wife's EMR servers (which
fortunately run under a top-level Linux/VMware setup with either lin or
win virtual servers), which apparently have been slowing down around
9:30 every morning for a few weeks.  HUGE incoming packet load on the
failover server, matched with lessee, that host's transmit packet.
Wonder why that is happening now, hmmm.  Look, there it is finished.
Network loads back to normal, EMR server load average on the way down
from 2.68 peak (baaad, only has one CPU in its VM, must fix) to its
normal healthy 0.4.  Problem not quite solved, but now I have an idea of
where to look, maybe I need to shift around a cron task so it doesn't
interrupt work time.  Wish I could do something about the perpetual load
average of 2+ on the Windows 2003 server -- with 4 cores it has SOME
headroom, but I'm guessing it is close to that nonlinear threshold.

vmstat is a wonderful tool.  vmstat in a table (with aggregate network
rx/tx) is even more wonderful.  And this is just the "basic" wulfstat
display.

Note well: on one of the hosts -- one that is basically a backup special
purpose linux VM that has almost nobody using it, load15 is zero
INCLUDING the averaged overhead imposed by a 5 second poll via xmlsysd.
xmlsysd is truly a lightweight tool (used as directed).

Enjoy.

    rgb

-- 
Robert G. Brown                            Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977