[Beowulf] bizarre scaling behavior on a Nehalem

Wed Aug 12 11:00:41 PDT 2009

On Wed, Aug 12, 2009 at 11:32 AM, Craig Tierney<Craig.Tierney at noaa.gov> wrote:
> What do you mean normally?  I am running Centos 5.3 with 2.6.18-128.2.1
> right now on a 448 node Nehalem cluster.  I am so far happy with how things work.
> The original Centos 5.3 kernel, 2.6.18-128.1.10 had bugs in Nelahem support
> where nodes would just start randomly run slow.  Upgrading the kernel
> fixed that.  But that performance problem was either all or none, I don't recall
> it exhibiting itself in the way that Rahul described.
>

I was trying another angle. Playing with the power profiles. Just
downloaded cpufreq-utils via yum. Tried to see what profile was
loaded:

cpufreq-info
cpufrequtils 005: cpufreq-info (C) Dominik Brodowski 2004-2006
Report errors and bugs to cpufreq at vger.kernel.org, please.
analyzing CPU 0:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 1:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 2:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 3:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 4:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 5:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 6:
  no or unknown cpufreq driver is active on this CPU
analyzing CPU 7:
  no or unknown cpufreq driver is active on this CPU

Is this lack of the right drivers indicative of a deeper fault or is
this fairly local to this issue? This could be a clue or a red
herring. Just thought that I ought to post it.

-- 
Rahul