[Beowulf] How to debug slow compute node?
Robert Horton
robh at dongle.org.uk
Thu Aug 10 08:16:51 PDT 2017
As John says, I'd start by checking the health of things like memory,
power supplies etc.
I've seen things like this which go away after a firmware update, so
I'd suggest updating the bios etc if you can.
Have you tried completely removing the power for a few minutes then
booting up again?
Any idea when the problem started? I presume from the cpu it's not a
new system. What physical form is it (1u server / blade etc)?
Rob
On Thu, 2017-08-10 at 08:39 -0600, Faraz Hussain wrote:
> One of our compute nodes runs ~30% slower than others. It has the
> exact same image so I am baffled why it is running slow . I have
> tested OMP and MPI benchmarks. Everything runs slower. The cpu
> usage
> goes to 2000%, so all looks normal there.
>
> I thought it may have to do with cpu scaling, i.e when the kernel
> changes the cpu speed depending on the workload. But we do not have
> that enabled on these machines.
>
> Here is a snippet from "cat /proc/cpuinfo". Everything is identical
> to
> our other nodes. Any suggestions on what else to check? I have
> tried
> rebooting it.
>
> processor : 19
> vendor_id : GenuineIntel
> cpu family : 6
> model : 62
> model name : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
> stepping : 4
> cpu MHz : 2500.098
> cache size : 25600 KB
> physical id : 1
> siblings : 10
> core id : 12
> cpu cores : 10
> apicid : 56
> initial apicid : 56
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
> pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
> syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
> rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64
> monitor
> ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2
> x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
> ida
> arat xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
> fsgsbase smep erms
> bogomips : 5004.97
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> To change your subscription (digest mode or unsubscribe) visit http:/
> /www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list