[Beowulf] How to Monitor Cluster
Robert G. Brown
rgb at phy.duke.edu
Fri Aug 24 06:50:24 PDT 2007
On Wed, 22 Aug 2007, Markus Sommereder wrote:
> Hello!
> I use a cronjob to read the load average and the memory usage from
> /proc/loadavg and /proc/meminfo of each node every minute and write the data
> into a round robin database (rrdtool). The graphs are generated from the
> database by a cgi-script when I open the monitoring webpage.
> Markus
>
> +++loadavg.sh+++
> #!/bin/sh
> LOAD=$(awk '{print $1":"$2":"$3}' < /proc/loadavg)
> rrdtool update loadavg.$HOSTNAME.rrd N:$LOAD
>
> +++memory.sh+++
> #!/bin/sh
> MEMF=`grep MemFree: /proc/meminfo|tr -s [:blank:]|cut -f2 -d" "`
> SWAPF=`grep SwapFree: /proc/meminfo|tr -s [:blank:]|cut -f2 -d" "`
> MEMFREE=$(expr $MEMF \* 1024)
> SWAPFREE=$(expr $SWAPF \* 1024)
> rrdtool update memory.$HOSTNAME.rrd N:$MEMFREE:$SWAPFREE
Y'all are working way too hard, as a lot of this is what wulfware
(xmlsysd and wulflogger, specifically) was built to do.
If you want to monitor pretty much any important dimension of cluster
node performance, sampled at pretty much any time granularity greater
than 1 second (you CAN do it that fast but it isn't advised for
Heisenbergish reasons - the load of sampling starts to be non-negligible
somewhere in there) then the simplest way to do it is to:
a) Obtain xmlsysd source and build it for your system from tarball or
source rpm, or grab the binary rpm (built for FC 6) and hope it works.
There is somebody in the process of putting it into Debian as well; I
don't know exactly what the status is of that effort but it will be
there soon. Install it on all your nodes. If they are rpm based this
is a matter of dropping the rpm into your local repo and distributing a
"yum -y install xmlsysd" command.
b) Verify that it is working by e.g. "telnet testnode 7887" and
entering "init" and then "send" and then "quit" at the first three
chances when you connect. You should see a large dump of xml-wrapped
system statistics on the "send".
c) Obtain the sources or binaries for: libwulf (required), wulfstat
(recommended), wulflogger (required for what this note describes), and
wulfweb (recommended just for fun). Build them or install them, noting
that libwulf is a dependency for wulfstat and wulflogger, and wulflogger
is a dependency for wulfweb. wulfstat/wulflogger only need to be
installed on clients from which one wishes to monitor the cluster.
wulfweb would ordinarily be installed on a stable host, possibly but not
necessarily a webserver. It basically generates and dynamically updates
a web page containing the latest wulflogger snapshot of the cluster at
(say) a granularity of a minute or so.
d) On any of the hosts from which you wish to monitor, you can then
create a .wulfhosts file in your home directory, using the examples in
the man pages or documentation as a template. For example either of the
following tag forms can be used to specify a cluster:
<?xml version="1.0"?>
<wulfstat>
<hostrange>
<hostfmt>b%02d</hostfmt>
<imin>0</imin>
<imax>15</imax>
<port>7887</port>
</hostrange>
<iprange>
<ipmin>192.168.1.128</ipmin>
<ipmax>192.168.1.191</ipmax>
<port>7887</port>
</iprange>
</wulfstat>
The first would specify a 16 node cluster resolvable by name as "b00,
b01, b02... b15". The second would specify a 64 node cluster directly
by node IP number in the defined range. There are other things you can
put into your .wulfhosts file to control your display as well.
e) At this point you can run wulfstat in a tty window (eg xterm) and
watch as its various descriptors are updated every five seconds (the
default). You can speed it up or slow it down. The default view is a
vmstat-like set of information, but there are also views onto just load
average, memory, network traffic, system descriptors (e.g. CPU type,
cache size, uptime, wall clock time), and processes running at the
instant of the snapshot.
More to the point, you can ALSO (or instead) run:
rgb at failover|B:1004>wulflogger
# Name Status Timestamp load1 load5 load15 rx byts tx byts si so pi po ctxt intr prun pblk
dimaecw up 1187878738.00 0.30 0.33 0.34 6986 21248 0 0 0 0 8168 3116 1 0
dimawin up 1187878738.00 1.19 1.93 2.13 69958 70689 0 0 0 0 5120 2220 2 1
failover up 1187878738.00 0.00 0.00 0.00 8143 2444 0 0 0 0 233 1025 1 0
ecw up 1187878737.99 0.23 0.32 0.33 92836 105391 0 0 0 0 620 1069 1 0
# Name Status Timestamp load1 load5 load15 rx byts tx byts si so pi po ctxt intr prun pblk
dimaecw up 1187878743.01 0.27 0.32 0.34 10693 33151 0 0 0 0 8050 3124 2 0
dimawin up 1187878743.01 1.41 1.97 2.14 70482 66535 0 0 0 0 2692 2154 2 1
failover up 1187878743.02 0.00 0.00 0.00 7855 2456 0 0 0 0 785 1304 1 0
ecw up 1187878743.02 0.21 0.32 0.32 185087 206168 0 0 0 0 955 1080 1 0
# Name Status Timestamp load1 load5 load15 rx byts tx byts si so pi po ctxt intr prun pblk
dimaecw up 1187878748.02 0.33 0.33 0.34 20597 65787 0 0 0 0 8054 3177 1 0
dimawin up 1187878748.02 1.46 1.97 2.14 76722 90187 0 0 0 0 2958 2167 3 1
failover up 1187878748.03 0.00 0.00 0.00 7453 2433 0 0 0 0 199 1011 1 1
ecw up 1187878748.02 0.59 0.39 0.35 229374 272848 0 0 0 0 1020 1132 3 0
# Name Status Timestamp load1 load5 load15 rx byts tx byts si so pi po ctxt intr prun pblk
dimaecw up 1187878753.04 0.39 0.34 0.35 7631 27819 0 0 0 0 7885 3111 1 0
dimawin up 1187878753.04 1.87 2.04 2.16 73568 141057 0 0 0 0 2709 2218 1 3
failover up 1187878753.82 0.00 0.00 0.00 6475 2108 0 0 0 0 192 1011 1 0
ecw up 1187878753.80 0.54 0.39 0.35 148885 167587 0 0 0 0 833 1072 1 0
# Name Status Timestamp load1 load5 load15 rx byts tx byts si so pi po ctxt intr prun pblk
dimaecw up 1187878758.83 0.33 0.33 0.34 10279 37208 0 0 0 0 8420 3122 2 0
dimawin up 1187878758.83 1.88 2.04 2.16 81118 64661 0 0 0 0 2847 2150 3 1
failover up 1187878758.84 0.00 0.00 0.00 7741 2434 0 0 0 0 202 1014 1 1
ecw up 1187878758.84 0.50 0.38 0.35 123911 149333 0 0 0 0 808 1075 1 0
...
and e.g. pipe the results to a file or through a perl script (like the
one found in wulfweb) to parse this out into a table you can print out
or plot or turn into a report or statistically analyze any way you like.
I even build a rrdtool display once upon a time but found the wulfweb or
wulfstat straight text display to be more useful.
Note the wealth of information in the default display -- load averages,
network traffic per interval, swap and paging activity, interrupt load,
the number of running and blocked processes. The other "views" can also
be dumped via wulflogger. The only thing I don't have in it (that
should probably be there) is a direct view on disk activity other than
paging and swap, partly because until recently the disk view in /proc
really sucked. With /proc/diskstat now present and much more parsible,
I'll probably implement a disk view in the suite as one of my next
chores. The last thing I did with it was add support for multicores
(which is still being debugged, as I only have dual cores to test and
play with).
Note also that wulfware is useful for things other than just monitoring
"clusters". The systems in the default display above are actually a
small vmware-based server farm. The first three are the toplevel linux
host (the level that runs vmware and hosts the VMs), two active and one
failover system for backup. The fourth is a VM running on the first.
The second is hosting two Windows servers, which are difficult to
monitor directly but which can be monitored INdirectly by keeping an eye
on the cumulative load on the VM host.
Thus one can use it to monitor or log the realtime numbers for an entire
workstation/PC LAN, for a HA server farm, for a HPC cluster, for any
mix-n-match of the above -- individual hosts to monitor can be easily
added to .wulfhosts. It even has a convenient flag for monitoring
"localhost", although of course one has alternative ways of doing that.
This lets you reuse any parsing scripts you might develop, though, and
if nothing else provides even your local host with a compact incremental
display of its important runtime statistics.
It is, naturally, GPL. Free, easily modifiable (that is, you can add
your own statistics to monitor if you like at the cost of hacking them
into xmlsysd inside a suitable set of tags (using the provided
subroutine utilities and existing code as a template, which makes it
pretty easy if a bit tedious) and adding a bit of code at the other end
(again using templated library calls to parse it back out) for display.
It's advantage over doing it yourself with e.g. distributed shell
scripts, NFS writes to a common directory, etc. is that it has been
designed from the beginning to be LIGHTWEIGHT. That is, running it at
the default granularity consumes a very small amount of the system's
total resources and hence doesn't CHANGE the numbers by slowing down the
system, stealing cycles or bandwidth from your running processes. It's
one luxury is that it packs things up in XML, which is obviously not
maximally compressed, but this makes it MUCH easier to parse out at the
far end with many tools and encourages a scalable and extensible design.
Extra tags are typically just ignored by the display clients, so one can
add tags to a custom xmlsysd without breaking the existing displays,
while working on a custom display to match, for example.
Wulfware can be grabbed from here:
http://www.phy.duke.edu/~rgb/Beowulf/wulfware.php
and yes, I cherish bug reports, feature requests, and so on. Eventually
I'll get this into Fedora, but the (dieharder) package I submitted for
inclusion six weeks ago hasn't yet been reviewed and I'm not optimistic
about getting it there QUICKLY until somebody lets me "join the club".
rgb
> A Lenzo wrote:
>> Hello Cluster Colleagues,
>>
>> I would like to begin monitoring my cluster in order to see what the usage
>> is at different times of day. A simple method would work - I am looking
>> for advice on this. The mosmon utility is enticing since it shows the
>> usage on all nodes at once, but of course, I can't pipe the output to a
>> text file. If I can find the right tool for the job, I am sure I can keep
>> it running every hour or so with Cron. Ideally, I'd love to measure memory
>> usage and also CPU usage.
>>
>> Suggestions?
>>
>> Thanks!
>> Tony
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list