<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Good question. I just checked using vmstat. When running xhpl on
both systems, vmstat shows only zeros for si and so, even long
after the performance degrades on the nfsroot instance. Just to be
sure, I double-checked with top, which shows 0k of swap being
used. <br>
</p>
<pre class="moz-signature" cols="72">Prentice</pre>
<div class="moz-cite-prefix">On 09/13/2017 02:15 PM, Scott Atchley
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAL8g0j+fBjXBRHmYj5kjyXbh+_D_wGjgmDnG_uOBJM=YR+EUww@mail.gmail.com">
<div dir="ltr">Are you swapping?</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Sep 13, 2017 at 2:14 PM, Andrew
Latham <span dir="ltr"><<a href="mailto:lathama@gmail.com"
target="_blank" moz-do-not-send="true">lathama@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">ack, so maybe validate you can reproduce with
another nfs root. Maybe a lab setup where a single server
is serving nfs root to the node. If you could reproduce in
that way then it would give some direction. Beyond that it
sounds like an interesting problem.</div>
<div class="gmail_extra">
<div>
<div class="h5"><br>
<div class="gmail_quote">On Wed, Sep 13, 2017 at 12:48
PM, Prentice Bisbal <span dir="ltr"><<a
href="mailto:pbisbal@pppl.gov" target="_blank"
moz-do-not-send="true">pbisbal@pppl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Okay,
based on the various responses I've gotten here
and on other lists, I feel I need to clarify
things:<br>
<br>
This problem only occurs when I'm running our
NFSroot based version of the OS (CentOS 6). When I
run the same OS installed on a local disk, I do
not have this problem, using the same exact
server(s). For testing purposes, I'm using
LINPACK, and running the same executable with the
same HPL.dat file in both instances.<br>
<br>
Because I'm testing the same hardware using
different OSes, this (should) eliminate the
problem being in the BIOS, and faulty hardware.
This leads me to believe it's most likely a
software configuration issue, like a kernel tuning
parameter, or some other software configuration
issue.<br>
<br>
These are Supermicro servers, and it seems they do
not provide CPU temps. I do see a chassis temp,
but not the temps of the individual CPUs. While I
agree that should be the first thing I look at,
it's not an option for me. Other tools like FLIR
and Infrared thermometers aren't really an option
for me, either.<br>
<br>
What software configuration, either a kernel a
parameter, configuration of numad or cpuspeed, or
some other setting, could affect this?<span
class="m_5099190104119760613HOEnZb"><font
color="#888888"><br>
<br>
Prentice</font></span><span
class="m_5099190104119760613im
m_5099190104119760613HOEnZb"><br>
<br>
On 09/08/2017 02:41 PM, Prentice Bisbal wrote:<br>
</span><span class="m_5099190104119760613im
m_5099190104119760613HOEnZb">
<blockquote class="gmail_quote" style="margin:0
0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
Beowulfers,<br>
<br>
I need your assistance debugging a problem:<br>
<br>
I have a dozen servers that are all identical
hardware: SuperMicro servers with AMD Opteron
6320 processors. Every since we upgraded to
CentOS 6, the users have been complaining of
wildly inconsistent performance across these
12 nodes. I ran LINPACK on these nodes, and
was able to duplicate the problem, with
performance varying from ~14 GFLOPS to 64
GFLOPS.<br>
<br>
I've identified that performance on the slower
nodes starts off fine, and then slowly
degrades throughout the LINPACK run. For
example, on a node with this problem, during
first LINPACK test, I can see the performance
drop from 115 GFLOPS down to 11.3 GFLOPS. That
constant, downward trend continues throughout
the remaining tests. At the start of
subsequent tests, performance will jump up to
about 9-10 GFLOPS, but then drop to 5-6 GLOPS
at the end of the test.<br>
<br>
Because of the nature of this problem, I
suspect this might be a thermal issue. My
guess is that the processor speed is being
throttled to prevent overheating on the "bad"
nodes.<br>
<br>
But here's the thing: this wasn't a problem
until we upgraded to CentOS 6. Where I work,
we use a read-only NFSroot filesystem for our
cluster nodes, so all nodes are mounting and
using the same exact read-only image of the
operating system. This only happens with these
SuperMicro nodes, and only with the CentOS 6
on NFSroot. RHEL5 on NFSroot worked fine, and
when I installed CentOS 6 on a local disk, the
nodes worked fine.<br>
<br>
Any ideas where to look or what to tweak to
fix this? Any idea why this is only occuring
with RHEL 6 w/ NFS root OS?<br>
<br>
</blockquote>
<br>
</span>
<div class="m_5099190104119760613HOEnZb">
<div class="m_5099190104119760613h5">
______________________________<wbr>_________________<br>
Beowulf mailing list, <a
href="mailto:Beowulf@beowulf.org"
target="_blank" moz-do-not-send="true">Beowulf@beowulf.org</a>
sponsored by Penguin Computing<br>
To change your subscription (digest mode or
unsubscribe) visit <a
href="http://www.beowulf.org/mailman/listinfo/beowulf"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
</div>
</div>
<span class="">-- <br>
<div class="m_5099190104119760613gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">- Andrew "lathama" Latham <a
href="mailto:lathama@gmail.com"
target="_blank" moz-do-not-send="true">lathama@gmail.com</a>
<a href="http://lathama.org" target="_blank"
moz-do-not-send="true">http://lathama.com</a> -</div>
</div>
</div>
</div>
</span></div>
<br>
______________________________<wbr>_________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org"
moz-do-not-send="true">Beowulf@beowulf.org</a> sponsored
by Penguin Computing<br>
To change your subscription (digest mode or unsubscribe)
visit <a
href="http://www.beowulf.org/mailman/listinfo/beowulf"
rel="noreferrer" target="_blank" moz-do-not-send="true">http://www.beowulf.org/<wbr>mailman/listinfo/beowulf</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>