[Beowulf] Weird blade performs worse as more cpus are used?

Faraz Hussain info at feacluster.com
Tue Sep 26 18:28:52 PDT 2017


The issue seems is now resolved after I did a full power down ( cold  
boot )! No idea what caused the issue in the first place.

Quoting Joe Landman <joe.landman at gmail.com>:

> On 09/14/2017 11:34 AM, Faraz Hussain wrote:
>>
>> Earlier I had posted about one of our blades running 30-50% slower  
>> than other  ones despite having identical hardware and OS. I  
>> followed the suggestions and compared cpu temperature, memory,  
>> dmesg and sysctl. Everything looks the same.
>>
>> I then used "perf stat" to compare speed of pigz ( parralel gzip ).  
>> The results are quite interesting. Using one cpu, the slow blade is  
>> as fast as the rest! But as I use more cpus, the speed decreases  
>> linearly from 3.1Ghz to 0.4 Ghz. See snippets from "perf stat"  
>> command below. All tests were on /tmp to eliminate any nfs issue.  
>> And same behavior is observed with any multi-threaded program.
>
> What does numastat report?  /tmp is a ramdisk or tmpfs?  Are the  
> nodes/cpus otherwise idle?  What does lscpu on a good/bad node report?
>
> If it decreases on a 1/Ncpu curve, then you have a fixed sized  
> resource bandwidth contention issue you are fighting.   The question  
> is what.
>
>
>
> -- 
>
> Joe Landman
> e: joe.landman at gmail.com
> t: @hpcjoe
> w: https://scalability.org
> g: https://github.com/joelandman
> l: https://www.linkedin.com/in/joelandman
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list