[Beowulf] Errors on IBM e325
Jeff Layton
jeffrey.b.layton at lmco.com
Fri Jun 25 08:21:11 PDT 2004
Good morning,
We've got a shiny new IBM cluster with e325 nodes (Opteron).
However, we're having some trouble with a number of nodes.
We keep getting 'GART' errors showing up in the logs. Here is
an example,
Jun 21 07:07:42 c3n32.cluster kernel: Lost an northbridge error
Jun 21 07:40:52 c1n4.cluster kernel: Lost an northbridge error
Jun 21 07:07:42 c3n32.cluster kernel: GART error 3
Jun 21 07:40:52 c1n4.cluster kernel: GART error 3
Jun 21 14:03:49 c1n2.cluster kernel: extended error chipkill ecc error
Jun 21 14:03:50 c1n2.cluster kernel: corrected ecc error
Does anybody have any ideas what the cause might be?
Thanks!
Jeff
--
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta
More information about the Beowulf
mailing list