[Beowulf] Errors on IBM e325

Jeff Layton jeffrey.b.layton at lmco.com
Fri Jun 25 08:21:11 PDT 2004


Good morning,

   We've got a shiny new IBM cluster with e325 nodes (Opteron).
However, we're having some trouble with a number of nodes.
We keep getting 'GART' errors showing up in the logs. Here is
an example,

Jun 21 07:07:42 c3n32.cluster kernel: Lost an northbridge error
Jun 21 07:40:52 c1n4.cluster kernel: Lost an northbridge error
Jun 21 07:07:42 c3n32.cluster kernel: GART error 3
Jun 21 07:40:52 c1n4.cluster kernel: GART error 3
Jun 21 14:03:49 c1n2.cluster kernel:     extended error chipkill ecc error
Jun 21 14:03:50 c1n2.cluster kernel:     corrected ecc error


   Does anybody have any ideas what the cause might be?

Thanks!

Jeff

-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta





More information about the Beowulf mailing list