[Beowulf] hpl size problems

Robert G. Brown rgb at phy.duke.edu
Tue Sep 27 07:36:40 PDT 2005


Greg M. Kurtzer writes:
(regarding what Mark Hahn writes:-)

>> on my favorite cluster, I use the obvious kind of initrd+tmpfs+NFS
>> and don't run any extra daemons.  on a randomly chosen node running 
>> two MPI workers (out of 64 in the job), "vmstat 10" looks like this:
>> 
>> [hahn at node1 hahn]$ ssh node70 vmstat 10
>> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>>  2  0 2106876  58240  11504 1309148    0    0     0     0 1035    54 99  1  0  0
>>  2  0 2106876  58312  11504 1309148    0    0     0     0 1037    59 99  1  0  0
>>  2  0 2106876  58312  11504 1309148    0    0     0     0 1034    55 99  1  0  0
>>  2  0 2106876  58312  11504 1309148    0    0     0     0 1033    56 99  1  0  0
>>  2  0 2106876  58312  11504 1309148    0    0     0     0 1034    44 99  1  0  0
>>  2  0 2106876  58312  11504 1309148    0    0     0     0 1031    41 99  1  0  0
>>  2  0 2106876  58312  11504 1309148    0    0     0     0 1033    39 99  1  0  0
>> 
>> I haven't updated the kernel to a lower HZ yet, but will soon.  I assert 
>> without the faintest whisp of proof that 50 cs/sec is inconsequential.
>> the gigabit on these nodes is certainly not sterile either - plenty of NFS 
>> traffic, even some NTP broadcasts.  actually, I just tcpdumped it a bit,
>> and the basal net rate is an arp, 4ish NFS access/getattr calls every 60 seconds.

Precisely.

>> 
>> 
>> > It reminds me of chapter 1 of sysadmin 101: Only install what you *need*
>> 
>> sure, but that's not inherent to your system, and unless you had some pretty
>> godaweful stuff installed before, it's hard to see that explanation...

Yes.

>> a full-fledged desktop load doesn't cause *that* much extraneous load - 
>> yes, there are interrupts and the like, but you have to remember that 
>> modern machines have massive memory bandwidth, big, associative caches,
>> and such stuff doesn't matter much.

Absolutely.  So little that most systems sitting idle are OVER 99% idle
even while managing a desktop, and even processing keystrokes and mouse
buttons doesn't usually warm it up more than 1%.

As I suggested before -- boot the system init 1, bring up the network
(only) and any absolutely required subsystems by hand, then run your
task.  Then run it init 2, init 3, then init 5.  On MOST systems idling
in init 5 should show a load average (non-idle CPU cycles over total CPU
cycles) of pretty much 0.00 or 0.01, even if you're running an X console
(but not a silly screensaver or as Mark noted some sort of polling piece
of eye-candy).  If you see a really large difference between init 1 plus
minimum and init 5, I'd look for something broken.  If you bring up
systems one at a time, you can probably identify exactly what it is that
is broken, if you care enough...:-)

> I was thinking that the increased context switching that would occur
> with more processes running would also increase the frequency at which
> the processes would bounce between the CPU's (no CPU/memory affinity).
> Now add to that the time it takes to repopulate the 2MB of L2 cache.

I measure something like 10-20 context switches per second on idle
workstations showing the login screen.  Interrupts tend to hold just
above 1024 (e.g. 1060-something) -- the timer interrupts plus probably
background network traffic.  These numbers basically don't change for a
CPU/memory bound task, and while they might do for a network bound task
that's a fair measure of the task load and not background.

At these levels they just don't have any effect on performance to speak
of.  A task will get the CPU for rather long time slices, measured in
TSC clocks.  A millisecond is several million clocks and you could burn
thousands of clocks on overhead and still be 99.99% efficient while you
are on the CPU.  It simply isn't credible that this could be a 30%
effect.

If something (else) in the background is running that generates a LOT
more interrupts and context switches, well, then anything is possible
but the best solution is to whomp that particular daemon or task upside
the head, as it almost certainly isn't necessary or is broken and
running away on some sort of polling loop.

   rgb
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050927/1715190a/attachment.sig>


More information about the Beowulf mailing list