How to tell when a job is swapping?
Jakob Østergaard
jakob at unthought.net
Tue Feb 19 08:31:01 PST 2002
On Tue, Feb 19, 2002 at 09:23:25AM -0500, Jeff Layton wrote:
> Good morning,
>
> For a while now I've been checking if a job is swapping
> on our clusters using bWatch. The nodes are dual CPU
> boxes and we run two MPI processes per node. I usually
> look at the load on the nodes to see if it is above 3.0
> (sometimes our code will peak out at about 2.3) and I
> look at the free swap space number (bWatch just cats the
> /proc/meminfo file).
> I usually assume that if the free swap space falls below
> the maximum and load starts climbing that the node is
> swapping. However, when I talk to the user, he states that
> the code is running fine and the timing numbers are where
> they should be. So, I'm obviously interpreting something
> incorrectly (unless the job is really swapping but for some
> reason performance is unaffected).
> Does someone give me a could way to check if a job
> is swapping? Maybe a URL?
You can test if "a" job is swapping (not a particular job) using
vmstat.
See, you don't care that you're 2 G into swap usually, as long as
it's rarely used data that's swapped out. And it will have no
performance impact on the system either. What you care about is
whether a job is "thrashing".
A small quiz to illustrate my point: Is this system loaded ?
[albatros:joe] $ free
total used free shared buffers cached
Mem: 513792 420720 93072 0 4696 111252
-/+ buffers/cache: 304772 209020
Swap: 2101000 718688 1382312
[albatros:joe] $
Oh, it has 512 MB of memory, and it's 718 MB into swap - oh horror !
Now look at vmstat:
[albatros:joe] $ vmstat 1
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
16 2 2 718620 4912 4896 108560 0 0 0 2 7 3 2 5 1
12 5 2 718620 4884 4744 108024 40 0 64 0 3222 3654 84 16 0
17 2 1 718620 5224 4360 105528 0 0 48 0 3960 4195 82 18 0
19 6 2 718620 6244 4160 104908 0 0 168 0 3820 4368 75 25 0
9 7 1 718620 5060 3932 100688 128 0 148 1672 3252 3514 77 23 0
The so and si numbers tell me how much paging (in and out) is happening -
the swap space is almost idle here.
Conclusion: This sytem is not stressed at all (wrt. swap space).
--
................................................................
: jakob at unthought.net : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob Østergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:
More information about the Beowulf
mailing list