[Beowulf] How to debug slow compute node?
Bill Broadley
bill at cse.ucdavis.edu
Wed Aug 16 17:14:59 PDT 2017
On 08/10/2017 07:39 AM, Faraz Hussain wrote:
> One of our compute nodes runs ~30% slower than others. It has the exact same
> image so I am baffled why it is running slow . I have tested OMP and MPI
> benchmarks. Everything runs slower. The cpu usage goes to 2000%, so all looks
> normal there.
We got some supermicro dual socket nodes without the little plastic air guides.
They thermally throttled really quickly.
I've also seen nodes that fall back to 1 channel because the dimms were in the
wrong slots.
I suggest comparing the physical nodes, double check fans (which should be
spinning), air conduits, dimm placement, etc. Then check dmesg, syslog,
temperatures, and compare a fast node to a slow node.
More information about the Beowulf
mailing list