Problems with dual Athlons
Ray Schwamberger
ray at advancedclustering.com
Wed Jul 31 06:33:56 PDT 2002
I've seen similar issues with some dual athlon systems, the issues as I
best can sort them from playing around with options...
1) OS/kernel version does seem to make a difference. The same systems
that run Red Hat 7.2 with perfect stability would crash within minutes
with complaints about interrupt handlers and modprobe errors for the
binfmt-0000 module, usually trashing their filesystems in the process.
Perhaps 2.4.18 is not a good choice for running dual Athlons, I've had
very limited time to play with this idea but that is the largest
coincidence I've managed to see so far.
2) While doing channel bonding, we were getting very uneven transfer
across the bonded interfaces. Once again this was dual athlons and
using the newer, supposedly SMP-safe bonding driver. The same machines
running on a uni-processor kernel showed no issues at all, therefore it
had to be a SMP issue. Using the 'noapic' kernel option at boot time
smoothed this one out, but again it points at something in the newest
kernels not agreeing readily with dual athlons, or perhaps the 762/768
chipset combination.
You might try the noapic option. I'm thinking there may be some kind of
issues with APIC, AMD and 2.4.18.
Manel Soria wrote:
> Hi,
>
> Please let me report a problem that we have in our cluster with dual Athlons
> in case that somebody can help us.
>
> We have 4 dual athlon systems running kernel 2.4.18 (gcc 2.95.2).
> Two of them crash frequently and the other two run fine.
> We have tried to replace different hardware components and desactivate
> the SMP option but the problem persists.
>
> The main difference between them is that the systems that crash
> (the servers) have two network interfaces while the systems that run
> fine (normal nodes) have only one network interface.
> Can this be the cause of the problem ? Would it be a good idea to use
> another version of gcc ?
>
> The motherboard is an ASUS AM7M266-D. One of the systems that
> crashes is running Debian 2.1 and the other Debian 2.2. The systems
> that don't crash run Debian 2.1.
>
> "Crash" here means that the VGA display is blank and the system has to
> be reseted. There is no other relevant message.
>
> Thanks
>
>
> --
> ===============================================
> Dr. Manel Soria
> ETSEIT - Centre Tecnologic de Transferencia de Calor
> C/ Colom 11 08222 Terrassa (Barcelona) SPAIN
> Tf: +34 93 739 8287 ; Fax: +34 93 739 8101
> E-Mail: manel at labtie.mmt.upc.es
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list