Problems with dual Athlons

Steven Timm timm at fnal.gov
Wed Jul 31 10:28:18 PDT 2002


Has anyone managed to successfully configure a Tyan 2466 board
so that it can have a boot partition that's bigger than 1024 cylinders
on its system drive?  Drive in question is WD200-BB

Steve Timm


------------------------------------------------------------------
Steven C. Timm (630) 840-8525  timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations

On Wed, 31 Jul 2002, Robert G. Brown wrote:

> On Wed, 31 Jul 2002, Ray Schwamberger wrote:
>
> > You might try the noapic option. I'm thinking there may be some kind of
> > issues with APIC, AMD and 2.4.18.
>
> We don't have ASUS systems but instead a mix of Tyan 2460 and 2466
> systems and see very similar things, including the bizarreness of the
> blind crash problems appearing on one system (consistently are
> repeatedly) but not another IDENTICAL system sitting right next to it.
>
> We have found that power supplies (both the power line itself and the
> switching power supply in the chassis) can make a difference on the
> 2466's -- a marginal power supply is an invitation to problems for sure
> on these beasties.  This is reflected in the completely outrageous
> observation that I have some nodes that will boot and run stably when
> plugged into certain receptacles on the power pole, but not other
> receptacles.  If I put a polarity/circuit tester on the receptacles,
> they pass.  If I check the line voltages, they are nominal (120+ VAC).
> If I plug any 2466 into them (I tried 3), it fails to POST.  If I move
> the plug two receptacles up on the same pole and same circuit, it POSTS,
> installs, and works fine.  I haven't put an oscilloscope on the line
> when plugging it in, but I'm sure it would be fascinating to do so.
>
> We're also in the problem of investigating kernel snapshot dependencies
> and the SMP issues aforementioned as we continue to try to stabilize our
> 2460's, which seem even more sensitive than the 2466's (which so far
> seem to run stably and and give decent performance overall).
> Unfortunately, our crashes occur with a mean time of days to a week or
> two under load in between (consistent with a rare interrupt conflict or
> SMP issue) so it takes a long time to test a potential fix.  We did
> avoid a crash for about 9 days on a 2460 running 2.4.18-5 (Red Hat's
> build id) after experiencing crashes on the node every 5-10 days, but
> are only just now accumulating better statistics on a group of nodes
> instead of just the one.
>
> So overall, I concur -- try different smp kernel releases and snapshots,
> try rearranging the cards (order often seems to matter) and bios
> settings, try --noapic (which we should probably also do -- we haven't
> so far) and yes, try rearranging the way the nodes are plugged in.
> Notice that this is evil and insidious -- you can pull a node from a
> rack and bench it and it will run fine forever, but if you plug it back
> in to the same receptacle when you put it back, it has problems.
> Maddening.
>
>    rgb
>
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>




More information about the Beowulf mailing list