Myrinet hardware reliability
Gerry Creager N5JXS
gerry.creager at tamu.edu
Sat Feb 8 06:20:11 PST 2003
Have you seen any indications of power supply problems on the
Victoria Pennington wrote:
> We have a 113 node IBM x330 cluster with Myrinet 2000. We're
> experiencing very high failure rates on Myrinet switch ports
> (average 3 per month) and on Myrinet NICs to a lesser extent
> (about 1 per month). Ports and NICs are fine one minute,
> then one or the other just dies (for good). Cables
> (fibre, not copper) seem fine - one or two failures only in
> nearly a year.
> There is no pattern in the failures, and they are entirely
> unrelated to usage levels; seldom used nodes are just as
> likely to have failures as heavily used nodes.
> We have another small IBM cluster with Myrinet 2000
> (16 port switch with copper cables), and this has run solidly
> for nearly 2 years with not one Myrinet hardware fault.
> I'd be really interested to know of others' experiences with
> Myrinet kit, especially in larger clusters.
> Dr Victoria Pennington
> Manchester Computing, Kilburn Building,
> University of Manchester,
> Oxford Road, Manchester M13 9PL
> tel. 0161 275 6830, email: v.pennington at man.ac.uk
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf