Problems with dual Athlons

Steven Timm timm at fnal.gov
Wed Jul 31 11:54:16 PDT 2002


------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations

On Wed, 31 Jul 2002, Robert G. Brown wrote:

> On Wed, 31 Jul 2002, Steven Timm wrote:
>
> > Has anyone managed to successfully configure a Tyan 2466 board
> > so that it can have a boot partition that's bigger than 1024 cylinders
> > on its system drive?  Drive in question is WD200-BB
>
> Are you using grub?  I thought that was no longer an issue with grub.
>
>    rgb

Haven't ported to 7.3 yet, so can't use grub.. besides, "Grub and Stitch"
doesn't quite sound the same.  But we'll keep it in mind
for when we do.

Steve


>
> >
> > Steve Timm
> >
> >
> > ------------------------------------------------------------------
> > Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
> > Fermilab Computing Division/Operating Systems Support
> > Scientific Computing Support Group--Computing Farms Operations
> >
> > On Wed, 31 Jul 2002, Robert G. Brown wrote:
> >
> > > On Wed, 31 Jul 2002, Ray Schwamberger wrote:
> > >
> > > > You might try the noapic option. I'm thinking there may be some kind of
> > > > issues with APIC, AMD and 2.4.18.
> > >
> > > We don't have ASUS systems but instead a mix of Tyan 2460 and 2466
> > > systems and see very similar things, including the bizarreness of the
> > > blind crash problems appearing on one system (consistently are
> > > repeatedly) but not another IDENTICAL system sitting right next to it.
> > >
> > > We have found that power supplies (both the power line itself and the
> > > switching power supply in the chassis) can make a difference on the
> > > 2466's -- a marginal power supply is an invitation to problems for sure
> > > on these beasties.  This is reflected in the completely outrageous
> > > observation that I have some nodes that will boot and run stably when
> > > plugged into certain receptacles on the power pole, but not other
> > > receptacles.  If I put a polarity/circuit tester on the receptacles,
> > > they pass.  If I check the line voltages, they are nominal (120+ VAC).
> > > If I plug any 2466 into them (I tried 3), it fails to POST.  If I move
> > > the plug two receptacles up on the same pole and same circuit, it POSTS,
> > > installs, and works fine.  I haven't put an oscilloscope on the line
> > > when plugging it in, but I'm sure it would be fascinating to do so.
> > >
> > > We're also in the problem of investigating kernel snapshot dependencies
> > > and the SMP issues aforementioned as we continue to try to stabilize our
> > > 2460's, which seem even more sensitive than the 2466's (which so far
> > > seem to run stably and and give decent performance overall).
> > > Unfortunately, our crashes occur with a mean time of days to a week or
> > > two under load in between (consistent with a rare interrupt conflict or
> > > SMP issue) so it takes a long time to test a potential fix.  We did
> > > avoid a crash for about 9 days on a 2460 running 2.4.18-5 (Red Hat's
> > > build id) after experiencing crashes on the node every 5-10 days, but
> > > are only just now accumulating better statistics on a group of nodes
> > > instead of just the one.
> > >
> > > So overall, I concur -- try different smp kernel releases and snapshots,
> > > try rearranging the cards (order often seems to matter) and bios
> > > settings, try --noapic (which we should probably also do -- we haven't
> > > so far) and yes, try rearranging the way the nodes are plugged in.
> > > Notice that this is evil and insidious -- you can pull a node from a
> > > rack and bench it and it will run fine forever, but if you plug it back
> > > in to the same receptacle when you put it back, it has problems.
> > > Maddening.
> > >
> > >    rgb
> > >
> > > Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> > > Duke University Dept. of Physics, Box 90305
> > > Durham, N.C. 27708-0305
> > > Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> > >
> > >
> > >
> > > _______________________________________________
> > > Beowulf mailing list, Beowulf at beowulf.org
> > > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> >
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
>




More information about the Beowulf mailing list