[Beowulf] How to Monitor Cluster

Fri Aug 24 06:50:24 PDT 2007

On Wed, 22 Aug 2007, Markus Sommereder wrote:

> Hello!
> I use a cronjob to read the load average and the memory usage from 
> /proc/loadavg and /proc/meminfo of each node every minute and write the data 
> into a round robin database (rrdtool). The graphs are generated from the 
> database by a cgi-script when I open the monitoring webpage.
> Markus
>
> +++loadavg.sh+++
> #!/bin/sh
> LOAD=$(awk '{print $1":"$2":"$3}' < /proc/loadavg)
> rrdtool update loadavg.$HOSTNAME.rrd N:$LOAD
>
> +++memory.sh+++
> #!/bin/sh
> MEMF=`grep MemFree: /proc/meminfo|tr -s [:blank:]|cut -f2 -d" "`
> SWAPF=`grep SwapFree: /proc/meminfo|tr -s [:blank:]|cut -f2 -d" "`
> MEMFREE=$(expr $MEMF \* 1024)
> SWAPFREE=$(expr $SWAPF \* 1024)
> rrdtool update memory.$HOSTNAME.rrd N:$MEMFREE:$SWAPFREE

Y'all are working way too hard, as a lot of this is what wulfware
(xmlsysd and wulflogger, specifically) was built to do.

If you want to monitor pretty much any important dimension of cluster
node performance, sampled at pretty much any time granularity greater
than 1 second (you CAN do it that fast but it isn't advised for
Heisenbergish reasons - the load of sampling starts to be non-negligible
somewhere in there) then the simplest way to do it is to:

   a) Obtain xmlsysd source and build it for your system from tarball or
source rpm, or grab the binary rpm (built for FC 6) and hope it works.
There is somebody in the process of putting it into Debian as well; I
don't know exactly what the status is of that effort but it will be
there soon.  Install it on all your nodes.  If they are rpm based this
is a matter of dropping the rpm into your local repo and distributing a
"yum -y install xmlsysd" command.

   b) Verify that it is working by e.g. "telnet testnode 7887" and
entering "init" and then "send" and then "quit" at the first three
chances when you connect.  You should see a large dump of xml-wrapped
system statistics on the "send".

   c) Obtain the sources or binaries for: libwulf (required), wulfstat
(recommended), wulflogger (required for what this note describes), and
wulfweb (recommended just for fun).  Build them or install them, noting
that libwulf is a dependency for wulfstat and wulflogger, and wulflogger
is a dependency for wulfweb.  wulfstat/wulflogger only need to be
installed on clients from which one wishes to monitor the cluster.
wulfweb would ordinarily be installed on a stable host, possibly but not
necessarily a webserver.  It basically generates and dynamically updates
a web page containing the latest wulflogger snapshot of the cluster at
(say) a granularity of a minute or so.

   d) On any of the hosts from which you wish to monitor, you can then
create a .wulfhosts file in your home directory, using the examples in
the man pages or documentation as a template.  For example either of the
following tag forms can be used to specify a cluster:

<?xml version="1.0"?>
<wulfstat>

  <hostrange>
    <hostfmt>b%02d</hostfmt>
    <imin>0</imin>
    <imax>15</imax>
    <port>7887</port>
  </hostrange>

  <iprange>
    <ipmin>192.168.1.128</ipmin>
    <ipmax>192.168.1.191</ipmax>
    <port>7887</port>
  </iprange>

</wulfstat>

The first would specify a 16 node cluster resolvable by name as "b00,
b01, b02... b15".  The second would specify a 64 node cluster directly
by node IP number in the defined range.  There are other things you can
put into your .wulfhosts file to control your display as well.

   e) At this point you can run wulfstat in a tty window (eg xterm) and
watch as its various descriptors are updated every five seconds (the
default).  You can speed it up or slow it down.  The default view is a
vmstat-like set of information, but there are also views onto just load
average, memory, network traffic, system descriptors (e.g. CPU type,
cache size, uptime, wall clock time), and processes running at the
instant of the snapshot.

More to the point, you can ALSO (or instead) run:

rgb at failover|B:1004>wulflogger
#     Name       Status    Timestamp    load1  load5 load15 rx byts tx byts  si  so  pi po ctxt intr prun pblk
dimaecw            up   1187878738.00    0.30   0.33   0.34   6986  21248    0   0   0   0 8168 3116    1    0
dimawin            up   1187878738.00    1.19   1.93   2.13  69958  70689    0   0   0   0 5120 2220    2    1
failover           up   1187878738.00    0.00   0.00   0.00   8143   2444    0   0   0   0  233 1025    1    0
ecw                up   1187878737.99    0.23   0.32   0.33  92836 105391    0   0   0   0  620 1069    1    0
#     Name       Status    Timestamp    load1  load5 load15 rx byts tx byts  si  so  pi po ctxt intr prun pblk
dimaecw            up   1187878743.01    0.27   0.32   0.34  10693  33151    0   0   0   0 8050 3124    2    0
dimawin            up   1187878743.01    1.41   1.97   2.14  70482  66535    0   0   0   0 2692 2154    2    1
failover           up   1187878743.02    0.00   0.00   0.00   7855   2456    0   0   0   0  785 1304    1    0
ecw                up   1187878743.02    0.21   0.32   0.32 185087 206168    0   0   0   0  955 1080    1    0
#     Name       Status    Timestamp    load1  load5 load15 rx byts tx byts  si  so  pi po ctxt intr prun pblk
dimaecw            up   1187878748.02    0.33   0.33   0.34  20597  65787    0   0   0   0 8054 3177    1    0
dimawin            up   1187878748.02    1.46   1.97   2.14  76722  90187    0   0   0   0 2958 2167    3    1
failover           up   1187878748.03    0.00   0.00   0.00   7453   2433    0   0   0   0  199 1011    1    1
ecw                up   1187878748.02    0.59   0.39   0.35 229374 272848    0   0   0   0 1020 1132    3    0
#     Name       Status    Timestamp    load1  load5 load15 rx byts tx byts  si  so  pi po ctxt intr prun pblk
dimaecw            up   1187878753.04    0.39   0.34   0.35   7631  27819    0   0   0   0 7885 3111    1    0
dimawin            up   1187878753.04    1.87   2.04   2.16  73568 141057    0   0   0   0 2709 2218    1    3
failover           up   1187878753.82    0.00   0.00   0.00   6475   2108    0   0   0   0  192 1011    1    0
ecw                up   1187878753.80    0.54   0.39   0.35 148885 167587    0   0   0   0  833 1072    1    0
#     Name       Status    Timestamp    load1  load5 load15 rx byts tx byts  si  so  pi po ctxt intr prun pblk
dimaecw            up   1187878758.83    0.33   0.33   0.34  10279  37208    0   0   0   0 8420 3122    2    0
dimawin            up   1187878758.83    1.88   2.04   2.16  81118  64661    0   0   0   0 2847 2150    3    1
failover           up   1187878758.84    0.00   0.00   0.00   7741   2434    0   0   0   0  202 1014    1    1
ecw                up   1187878758.84    0.50   0.38   0.35 123911 149333    0   0   0   0  808 1075    1    0
...

and e.g. pipe the results to a file or through a perl script (like the
one found in wulfweb) to parse this out into a table you can print out
or plot or turn into a report or statistically analyze any way you like.
I even build a rrdtool display once upon a time but found the wulfweb or
wulfstat straight text display to be more useful.

Note the wealth of information in the default display -- load averages,
network traffic per interval, swap and paging activity, interrupt load,
the number of running and blocked processes.  The other "views" can also
be dumped via wulflogger.  The only thing I don't have in it (that
should probably be there) is a direct view on disk activity other than
paging and swap, partly because until recently the disk view in /proc
really sucked.  With /proc/diskstat now present and much more parsible,
I'll probably implement a disk view in the suite as one of my next
chores.  The last thing I did with it was add support for multicores
(which is still being debugged, as I only have dual cores to test and
play with).

Note also that wulfware is useful for things other than just monitoring
"clusters".  The systems in the default display above are actually a
small vmware-based server farm.  The first three are the toplevel linux
host (the level that runs vmware and hosts the VMs), two active and one
failover system for backup.  The fourth is a VM running on the first.
The second is hosting two Windows servers, which are difficult to
monitor directly but which can be monitored INdirectly by keeping an eye
on the cumulative load on the VM host.

Thus one can use it to monitor or log the realtime numbers for an entire
workstation/PC LAN, for a HA server farm, for a HPC cluster, for any
mix-n-match of the above -- individual hosts to monitor can be easily
added to .wulfhosts.  It even has a convenient flag for monitoring
"localhost", although of course one has alternative ways of doing that.
This lets you reuse any parsing scripts you might develop, though, and
if nothing else provides even your local host with a compact incremental
display of its important runtime statistics.

It is, naturally, GPL.  Free, easily modifiable (that is, you can add
your own statistics to monitor if you like at the cost of hacking them
into xmlsysd inside a suitable set of tags (using the provided
subroutine utilities and existing code as a template, which makes it
pretty easy if a bit tedious) and adding a bit of code at the other end
(again using templated library calls to parse it back out) for display.

It's advantage over doing it yourself with e.g. distributed shell
scripts, NFS writes to a common directory, etc. is that it has been
designed from the beginning to be LIGHTWEIGHT.  That is, running it at
the default granularity consumes a very small amount of the system's
total resources and hence doesn't CHANGE the numbers by slowing down the
system, stealing cycles or bandwidth from your running processes.  It's
one luxury is that it packs things up in XML, which is obviously not
maximally compressed, but this makes it MUCH easier to parse out at the
far end with many tools and encourages a scalable and extensible design.
Extra tags are typically just ignored by the display clients, so one can
add tags to a custom xmlsysd without breaking the existing displays,
while working on a custom display to match, for example.

Wulfware can be grabbed from here:

   http://www.phy.duke.edu/~rgb/Beowulf/wulfware.php

and yes, I cherish bug reports, feature requests, and so on.  Eventually
I'll get this into Fedora, but the (dieharder) package I submitted for
inclusion six weeks ago hasn't yet been reviewed and I'm not optimistic
about getting it there QUICKLY until somebody lets me "join the club".

    rgb

> A Lenzo wrote:
>> Hello Cluster Colleagues,
>> 
>> I would like to begin monitoring my cluster in order to see what the usage 
>> is at different times of day.  A simple method would work - I am looking 
>> for advice on this.  The mosmon utility is enticing since it shows the 
>> usage on all nodes at once, but of course, I can't pipe the output to a 
>> text file.  If I can find the right tool for the job, I am sure I can keep 
>> it running every hour or so with Cron.  Ideally, I'd love to measure memory 
>> usage and also CPU usage.
>> 
>> Suggestions?
>> 
>> Thanks!
>> Tony

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu