Planned Cluster.

Kim Branson bra369 at pp.molsci.csiro.au
Thu Oct 26 17:09:13 PDT 2000


I have tried both cards, and yes before even buying a single card i
checked out all the archives. Now extensive (read 4 weeks run time on 2
nodes) has not revealed any faults. Its when the supplier told us they had
had problems i wondered if it was poor quality control in the manufacture
or some software problems which had not surfaced due to the nature of my
test application, which is not a heavy network style of calculation. I
have been using the 3c59x driver, however i do consider that this is the
appropriate place to ask such questions. 

I merely wondered if others had seen this fault or is it a recent thing,
as a phd student with a limited budget to spend on building equipment, i'd
like to ensure that before spending money on 65 cards that they work, and
are reliable. 

kim branson



______________________________________________________________________ 

Mr Kim Branson
Phd Student
Diffraction and Theory
Biomolecular Research Institute
343 Royal Parade, Melbourne Victoria
Ph 61 03 9662 7300
Email kim.branson at bioresi.com.au

______________________________________________________________________ 


On Thu, 26 Oct 2000, Bogdan Costescu wrote:

> On Wed, 25 Oct 2000, J. G. LaBounty wrote:
> 
> > > 
> > > The head node works fine, but people have mentioned problems with the 3com
> > > network card. testing has shown no problems but the vendor has informed us
> > > they have had problems with the 3com cards, "some batches don't seem to
> > > work", they have offered intel EtherExpress PRO 10/100+ TX - PCI cards for
> > > the same cost. 
> > 
> > We were using the 3com 905b cards on 2 16 node clusters. Our application
> > keeps the network pegged most of the time. We were getting network
> > hangs about once every two weeks running RH6.1. We moved to RH6.2
> > and switched to the 3c90x driver and problem happened about once per
> > day. We have since changed out the 3com cards for the EtherExpress PRO 
> > 10/100 and have not seen the problem but we only have about 3 weeks of
> > runtime on this configuration.
> 
> Sorry guys, but I don't quite get it!
> The network is maybe the most important part of a cluster setup. And what
> do you do about it ? "I heard that this card doesn't work right" or "It
> seems that this card works better". While there is nothing wrong in asking
> about card/driver combinations on this list, do you ALSO take a look at
> archives of mailing list devoted to development of these drivers ?
> And if you have a problem, do you report it on such a list ?
> Or you just say: "OK, this card/driver combination is just crap, let's
> change it." ? What if you still have problems after the change - will you
> make another change ?
> I encountered the same way of thinking on the NFS list...
> 
> For reference: http://www.scyld.com/network/index.html has links for
> drivers (and more) while mailing list archives start at: 
> http://www.scyld.com/mailman/listinfo
> 
> Going back to the 3Com problem: the driver that was present in kernels up
> to around 2.2.15 was an old driver, based on Don's 0.99H and modified by
> different people. It had a race which was only possible to happen in a
> very narrow window; but 2-3 weeks of uptime under load give this window
> the opportunity to happen (I know because I had exactly the same
> problem). Now I have 3C905 B and C cards in UP and SMP nodes which have
> uptimes of more than 2 months (we do upgrade kernels from time to time).
> RH 6.1 had the "bad" driver; the original kernel from RH 6.2 also had it,
> but the updated 2.2.16-3 has the new one; and this is the 3c59x driver,
> not the 3c90x driver (which is written by 3Com).
> If you trust Don's drivers more, his 3c59x is available from: 
> http://www.scyld.com/network/vortex.html and it includes (AFAIK) a
> fix for this problem and much more.
> 
> Best regards,
> 
> Bogdan Costescu
> 
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De
> 
> 
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
> 





More information about the Beowulf mailing list