[Beowulf] How to debug slow compute node?

Bill Broadley bill at cse.ucdavis.edu
Wed Aug 16 17:14:59 PDT 2017


On 08/10/2017 07:39 AM, Faraz Hussain wrote:
> One of our compute nodes runs ~30% slower than others. It has the exact same
> image so I am baffled why it is running slow . I have tested OMP and MPI
> benchmarks. Everything runs slower. The cpu usage goes to 2000%, so all looks
> normal there.

We got some supermicro dual socket nodes without the little plastic air guides.
They thermally throttled really quickly.

I've also seen nodes that fall back to 1 channel because the dimms were in the
wrong slots.

I suggest comparing the physical nodes, double check fans (which should be
spinning), air conduits, dimm placement, etc.  Then check dmesg, syslog,
temperatures, and compare a fast node to a slow node.




More information about the Beowulf mailing list