[Beowulf] IB symbol error thresholds for health check scripts ?

Peter Kjellström cap at nsc.liu.se
Mon Jan 3 02:17:22 PST 2011


On Wednesday, December 29, 2010 07:29:21 pm Stuart Barkley wrote:
> On Mon, 13 Dec 2010 at 17:43 -0000, Christopher Samuel wrote:
...
> > One of the checks we do is to check that there are no symbol errors
> > on the IB link. However, I'm wondering if simply saying a single
> > error is too brutal for this - what do other people do about these ?
> 
> I'm looking at Infiniband problems currently and have been watching
> our SymbolErrorCounter values.  I'm told a "small number" of these
> errors are okay.  I don't know the definition of "small" or over how
> long a time period.

My personal take on this is that for a week of data or so two digits indicates 
a non-perfect link/port (but will probably not be a real problem). Three 
digits is a problem, fix it.

/Peter
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
Url : http://www.beowulf.org/pipermail/beowulf/attachments/20110103/c8dcebda/attachment.bin 


More information about the Beowulf mailing list