Riser card -mainboard conflicts?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jeff Nguyen jeff at aslab.comWed Jan 8 10:17:40 PST 2003
- Previous message: Riser card -mainboard conflicts?
- Next message: Riser card -mainboard conflicts?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Donald, Did you receive a recent Email that I sent regarding the request for quotation? Please let me know if that message got through or not. Jeff ----- Original Message ----- From: "Donald Becker" <becker at scyld.com> To: <tegner at nada.kth.se> Cc: <beowulf at beowulf.org> Sent: Wednesday, January 08, 2003 8:19 AM Subject: Re: Riser card -mainboard conflicts? > On Wed, 8 Jan 2003 tegner at nada.kth.se wrote: > > > We have a cluster consisting of 30 athlon 2000+ nodes on a KT3 Ultra > > MS-6380E mainboard (using ide discs) connected by a fast Ethernet > > network. > > > > For the nodes we use 2U chassis, and the NIC and the graphic card sit on a > > PCI-301 riser card. > .. > > On one of the nodes we can newer get the network to function, there > > are messages about bus-master dirty, PCI bus error, etc, and we never > > get any contact with the rest of the cluster. > > PCI bus errors are a pretty clear indication that the riser cards are a > problem. > > > The other nodes "seem" to work OK, but for some parallel applications > > one or more of the nodes just "give up" after some time, and in those > > cases we get similar messages as above - but it have also happened > > that a node just died in which case we have to use the reset button to > > get it back. > ... > > We start to suspect that mainboard and the riser card are in some way > > incompatible, but we would greatly appreciate any hints of other > > reasons for these problems. > > OK, here is an alternative: you have _both_ memory errors and PCI errors. > Track down the PCI errors first. > > Not all drivers report PCI bus errors. Especially with vendor-written > drivers, there is a reason to ignore or silently recover from errors -- > the driver and hardware _appears_ more robust when there are no messages. > The scary thing is that you might have silent data corruption from other > devices. Any driver that goes to the extra effort of reporting a bus > error is doing you a big favor by pointing out the problem! > > -- > Donald Becker becker at scyld.com > Scyld Computing Corporation http://www.scyld.com > 410 Severn Ave. Suite 210 Scyld Beowulf cluster system > Annapolis MD 21403 410-990-9993 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: Riser card -mainboard conflicts?
- Next message: Riser card -mainboard conflicts?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
