[Beowulf] How to debug slow compute node?
Chris Samuel
samuel at unimelb.edu.au
Fri Aug 11 20:35:57 PDT 2017
On Friday, 11 August 2017 12:39:07 AM AEST Faraz Hussain wrote:
> I thought it may have to do with cpu scaling, i.e when the kernel
> changes the cpu speed depending on the workload. But we do not have
> that enabled on these machines.
Just to add to the excellent suggestions from others: have you compared BIOS/
UEFI settings & versions across these nodes to ensure they're identical?
Also remember that the kernel can enable C states that hurt performance even
if they are disabled in the BIOS/UEFI. This was painfully apparent on our
first SandyBridge cluster that almost failed the performance part of acceptance
testing until it got found.
Now we boot all nodes with this in the kernel cmdline:
intel_idle.max_cstate=0 processor.max_cstate=1 intel_pstate=disable
Best of luck!
Chris
--
Christopher Samuel Senior Systems Administrator
Melbourne Bioinformatics - The University of Melbourne
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
More information about the Beowulf
mailing list