[Beowulf] cli alternative to cluster top?
Donald Becker
becker at scyld.com
Sun Nov 30 08:52:20 PST 2008
On Wed, 26 Nov 2008, Thomas Vixel wrote:
> I've been googling for a top-like cli tool to use on our cluster, but
> the closest thing that comes up is Rocks' "cluster top" script. That
> could be tweaked to work via the cli, but due to factors beyond my
> control (management) all functionality has to come from a pre-fab
> program rather than a software stack with local, custom modifications.
>
> I'm sure this has come up more than once in the HPC sector as well --
> could anyone point me to any top-like apps for our cluster?
Most remote job mechanisms only think about starting remote processes, not
about the full create-monitor-control-report functionality.
The Scyld system (currently branded "Clusterware") defaults to using a
built-in unified process space. That presents all of the processes
running over the cluster in a process space on the master machine, with
fully POSIX semantics. It neatly solves your need with... the standard
'top' program.
Most scheduling systems also have a way to monitor processes that they
start, but I haven't found one that takes advantage of all information
available and reports it quickly/efficiently.
There are many advantages of the Scyld implementation
-- no new or modified process management tools need to be written.
Standard utilities such as 'top' and 'ps' work unmodified,
as well as tools we didn't specifically plan for e.g. GUI versions of
'pstree'.
-- The 'killall' program works over the cluster, efficiently.
-- All signals work as expected, including 'kill -9'. (Most remote
process starting mechanisms will just kill off the local endpoint,
leaving the remote process running-but-confused.)
-- Process groups and controlling-TTY groups works properly for job
control and signals
-- Running jobs report their status and statistics accurately -- an
updated 'rusage' structure is sent once per second, and a final
rusage structure and exit status is sent when the process terminates.
The "downside" is that we explicitly use Linux features and details,
relying on kernel-version-specific features. That's an issue if it's a
one-off hack, but we've been using this approach continuously for
a decade, since the Linux 2.2 kernel and over multiple
architectures. We've been producing supported commercial releases
since 2000, longer than anyone else in the business.
--
Donald Becker becker at scyld.com
Penguin Computing / Scyld Software
www.penguincomputing.com www.scyld.com
Annapolis MD and San Francisco CA
More information about the Beowulf
mailing list