Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

Myrinet hardware reliability

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Gerry Creager N5JXS gerry.creager at tamu.edu
Sat Feb 8 06:20:11 PST 2003


Have you seen any indications of power supply problems on the 
problemmatic cluster?

Gerry

Victoria Pennington wrote:
> Hi,
> 
> We have a 113 node IBM x330 cluster with Myrinet 2000.  We're
> experiencing very high failure rates on Myrinet switch ports
> (average 3 per month) and on Myrinet NICs to a lesser extent
> (about 1 per month).  Ports and NICs are fine one minute,
> then one or the other just dies (for good).  Cables
> (fibre, not copper) seem fine - one or two failures only in
> nearly a year.
> 
> There is no pattern in the failures, and they are entirely
> unrelated to usage levels; seldom used nodes are just as
> likely to have failures as heavily used nodes.
> 
> We have another small IBM cluster with Myrinet 2000
> (16 port switch with copper cables), and this has run solidly
> for nearly 2 years with not one Myrinet hardware fault.
> 
> I'd be really interested to know of others' experiences with
> Myrinet kit, especially in larger clusters.
> 
> Thanks
> Victoria
> ---
> Dr Victoria Pennington
> Manchester Computing, Kilburn Building,
> University of Manchester,
> Oxford Road, Manchester M13 9PL
> tel. 0161 275 6830, email: v.pennington at man.ac.uk
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list