<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Another good question. The systems with the nfsroot os still have
a local disk. That local disk has a /var partition where logs are
written. Both system do send some logs to a remote log server.
While /etc/rsyslog.conf files were almost identical, I copied the
one from the nfsroot system to the local-os system to make sure
they were identical. This has had no impact on the performance of
xhpl. <br>
</p>
<pre class="moz-signature" cols="72">Prentice</pre>
<div class="moz-cite-prefix">On 09/13/2017 02:16 PM, Scott Atchley
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAL8g0j+_zgCQnVmz3Cxn=nzv5mniLdZ9mWhcD3L8e_JrDqYctQ@mail.gmail.com">
<div dir="ltr">Are you logging something goes to the disk in the
local case, but that is competing for network bandwidth when NFS
mounting?</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Sep 13, 2017 at 2:15 PM, Scott
Atchley <span dir="ltr"><<a
href="mailto:e.scott.atchley@gmail.com" target="_blank"
moz-do-not-send="true">e.scott.atchley@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Are you swapping?</div>
<div class="HOEnZb">
<div class="h5">
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Sep 13, 2017 at 2:14
PM, Andrew Latham <span dir="ltr"><<a
href="mailto:lathama@gmail.com" target="_blank"
moz-do-not-send="true">lathama@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">ack, so maybe validate you can
reproduce with another nfs root. Maybe a lab
setup where a single server is serving nfs root
to the node. If you could reproduce in that way
then it would give some direction. Beyond that
it sounds like an interesting problem.</div>
<div class="gmail_extra">
<div>
<div class="m_7604217799998711846h5"><br>
<div class="gmail_quote">On Wed, Sep 13,
2017 at 12:48 PM, Prentice Bisbal <span
dir="ltr"><<a
href="mailto:pbisbal@pppl.gov"
target="_blank" moz-do-not-send="true">pbisbal@pppl.gov</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">Okay, based
on the various responses I've gotten
here and on other lists, I feel I need
to clarify things:<br>
<br>
This problem only occurs when I'm
running our NFSroot based version of the
OS (CentOS 6). When I run the same OS
installed on a local disk, I do not have
this problem, using the same exact
server(s). For testing purposes, I'm
using LINPACK, and running the same
executable with the same HPL.dat file
in both instances.<br>
<br>
Because I'm testing the same hardware
using different OSes, this (should)
eliminate the problem being in the BIOS,
and faulty hardware. This leads me to
believe it's most likely a software
configuration issue, like a kernel
tuning parameter, or some other software
configuration issue.<br>
<br>
These are Supermicro servers, and it
seems they do not provide CPU temps. I
do see a chassis temp, but not the temps
of the individual CPUs. While I agree
that should be the first thing I look
at, it's not an option for me. Other
tools like FLIR and Infrared
thermometers aren't really an option for
me, either.<br>
<br>
What software configuration, either a
kernel a parameter, configuration of
numad or cpuspeed, or some other
setting, could affect this?<span
class="m_7604217799998711846m_5099190104119760613HOEnZb"><font
color="#888888"><br>
<br>
Prentice</font></span><span
class="m_7604217799998711846m_5099190104119760613im
m_7604217799998711846m_5099190104119760613HOEnZb"><br>
<br>
On 09/08/2017 02:41 PM, Prentice
Bisbal wrote:<br>
</span><span
class="m_7604217799998711846m_5099190104119760613im
m_7604217799998711846m_5099190104119760613HOEnZb">
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Beowulfers,<br>
<br>
I need your assistance debugging a
problem:<br>
<br>
I have a dozen servers that are all
identical hardware: SuperMicro
servers with AMD Opteron 6320
processors. Every since we upgraded
to CentOS 6, the users have been
complaining of wildly inconsistent
performance across these 12 nodes. I
ran LINPACK on these nodes, and was
able to duplicate the problem, with
performance varying from ~14 GFLOPS
to 64 GFLOPS.<br>
<br>
I've identified that performance on
the slower nodes starts off fine,
and then slowly degrades throughout
the LINPACK run. For example, on a
node with this problem, during first
LINPACK test, I can see the
performance drop from 115 GFLOPS
down to 11.3 GFLOPS. That constant,
downward trend continues throughout
the remaining tests. At the start of
subsequent tests, performance will
jump up to about 9-10 GFLOPS, but
then drop to 5-6 GLOPS at the end of
the test.<br>
<br>
Because of the nature of this
problem, I suspect this might be a
thermal issue. My guess is that the
processor speed is being throttled
to prevent overheating on the "bad"
nodes.<br>
<br>
But here's the thing: this wasn't a
problem until we upgraded to CentOS
6. Where I work, we use a read-only
NFSroot filesystem for our cluster
nodes, so all nodes are mounting and
using the same exact read-only image
of the operating system. This only
happens with these SuperMicro nodes,
and only with the CentOS 6 on
NFSroot. RHEL5 on NFSroot worked
fine, and when I installed CentOS 6
on a local disk, the nodes worked
fine.<br>
<br>
Any ideas where to look or what to
tweak to fix this? Any idea why this
is only occuring with RHEL 6 w/ NFS
root OS?<br>
<br>
</blockquote>
<br>
</span>
<div
class="m_7604217799998711846m_5099190104119760613HOEnZb">
<div
class="m_7604217799998711846m_5099190104119760613h5">
______________________________<wbr>_________________<br>
Beowulf mailing list, <a
href="mailto:Beowulf@beowulf.org"
target="_blank"
moz-do-not-send="true">Beowulf@beowulf.org</a>
sponsored by Penguin Computing<br>
To change your subscription (digest
mode or unsubscribe) visit <a
href="http://www.beowulf.org/mailman/listinfo/beowulf"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
</div>
</div>
<span>-- <br>
<div
class="m_7604217799998711846m_5099190104119760613gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">- Andrew "lathama" Latham
<a href="mailto:lathama@gmail.com"
target="_blank"
moz-do-not-send="true">lathama@gmail.com</a>
<a href="http://lathama.org"
target="_blank"
moz-do-not-send="true">http://lathama.com</a> -</div>
</div>
</div>
</div>
</span></div>
<br>
______________________________<wbr>_________________<br>
Beowulf mailing list, <a
href="mailto:Beowulf@beowulf.org"
target="_blank" moz-do-not-send="true">Beowulf@beowulf.org</a>
sponsored by Penguin Computing<br>
To change your subscription (digest mode or
unsubscribe) visit <a
href="http://www.beowulf.org/mailman/listinfo/beowulf"
rel="noreferrer" target="_blank"
moz-do-not-send="true">http://www.beowulf.org/mailman<wbr>/listinfo/beowulf</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>