Myrinet hardware reliability
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Gerry Creager N5JXS gerry.creager at tamu.eduSat Feb 8 06:20:11 PST 2003
- Previous message: Myrinet hardware reliability
- Next message: Question about custers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Have you seen any indications of power supply problems on the problemmatic cluster? Gerry Victoria Pennington wrote: > Hi, > > We have a 113 node IBM x330 cluster with Myrinet 2000. We're > experiencing very high failure rates on Myrinet switch ports > (average 3 per month) and on Myrinet NICs to a lesser extent > (about 1 per month). Ports and NICs are fine one minute, > then one or the other just dies (for good). Cables > (fibre, not copper) seem fine - one or two failures only in > nearly a year. > > There is no pattern in the failures, and they are entirely > unrelated to usage levels; seldom used nodes are just as > likely to have failures as heavily used nodes. > > We have another small IBM cluster with Myrinet 2000 > (16 port switch with copper cables), and this has run solidly > for nearly 2 years with not one Myrinet hardware fault. > > I'd be really interested to know of others' experiences with > Myrinet kit, especially in larger clusters. > > Thanks > Victoria > --- > Dr Victoria Pennington > Manchester Computing, Kilburn Building, > University of Manchester, > Oxford Road, Manchester M13 9PL > tel. 0161 275 6830, email: v.pennington at man.ac.uk > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: Myrinet hardware reliability
- Next message: Question about custers
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
