[Beowulf] Cluster consistency checks

Wed Mar 30 01:59:15 PDT 2016

On Fri, 25 Mar 2016 14:39:12 -0400
Jeffrey Layton <laytonjb at gmail.com> wrote:

> Olli-Pekka, et al,
> 
> I took a look at your updated website - it looks very good. One thing
> I wanted to ask, and this question is probably one for the entire
> list, when you run a test across all of the nodes in the cluster,
> what process do you use to determine if nodes are "outliers" and need
> attention?

Initially I typically push numeric data into a bucket/histogram tool.
Works fine with amount of time, bytes, performance, ...

# pdsh -w n[100-300] 'cat /proc/loadavg' | dbuck -n 10 -sS
Statistical summary
---------------------------------------------------------------------
 Number of values         : 201
 Number of rejected lines : 0
 Min value                : 0.910000
 Max value                : 24.610000
 Mean                     : 13.638010
 Median                   : 16.000000
 Standard deviation       : 4.899393
 Sum                      : 2741.240000

 0.91- 3.28:  18  n[112,122,135,137,139,156-157,165,167,172,179,221...
 3.28- 5.65:   3  n[100,169,173]
 5.65- 8.02:  13  n[110,117,136,192,214,216,223,242-243,257-259,266]
 8.02-10.39:  14  n[109,127-129,131,142,154,162,190,213,235,260,265...
10.39-12.76:   5  n[164,199,233,245,290]
12.76-15.13:   8  n[106,115,141,160,210,224,228,293]
15.13-17.50: 138  n[101-105,107-108,111,113-114,116,118-121,123-126...
17.50-19.87:   0  
19.87-22.24:   1  n146
22.24-24.61:   1  n207

/Peter K