Problems with dual Athlons

Ray Schwamberger ray at advancedclustering.com
Wed Jul 31 06:33:56 PDT 2002


I've seen similar issues with some dual athlon systems, the issues as I 
best can sort them from playing around with options...

1)  OS/kernel version does seem to make a difference.  The same systems 
that run Red Hat 7.2 with perfect stability would crash within minutes 
with complaints about interrupt handlers and modprobe errors for the 
binfmt-0000 module, usually trashing their filesystems in the process. 
Perhaps 2.4.18 is not a good choice for running dual Athlons, I've had 
very limited time to play with this idea but that is the largest 
coincidence I've managed to see so far.

2) While doing channel bonding, we were getting very uneven transfer 
across the bonded interfaces.  Once again this was dual athlons and 
using the newer, supposedly SMP-safe bonding driver.  The same machines 
running on a uni-processor kernel showed no issues at all, therefore it 
had to be a SMP issue.  Using the 'noapic'  kernel option at boot time 
smoothed this one out, but again it points at something in the newest 
kernels not agreeing readily with dual athlons, or perhaps the 762/768 
chipset combination.

You might try the noapic option. I'm thinking there may be some kind of 
issues with APIC, AMD and 2.4.18.


Manel Soria wrote:
> Hi,
> 
> Please let me report a problem that we have  in our cluster with dual Athlons
> in case that somebody can help us.
> 
> We have 4 dual athlon systems running kernel 2.4.18 (gcc 2.95.2).
> Two of them crash frequently and the other two run fine.
> We have tried to replace different hardware  components and desactivate
> the SMP option but the problem persists.
> 
> The main difference between them is that the systems that crash
> (the servers) have two network interfaces while the systems that run
> fine (normal nodes) have only one network interface.
> Can this be the cause of the problem ?  Would it be a good idea to use
> another version of gcc ?
> 
> The motherboard is an ASUS AM7M266-D. One of the systems that
> crashes is running Debian 2.1  and the other Debian 2.2. The systems
> that don't crash run Debian 2.1.
> 
> "Crash" here means that the VGA display is blank and the system has to
> be reseted. There is no other relevant message.
> 
> Thanks
> 
> 
> --
> ===============================================
> Dr. Manel Soria
> ETSEIT - Centre Tecnologic de Transferencia de Calor
> C/ Colom 11  08222 Terrassa (Barcelona) SPAIN
> Tf:  +34 93 739 8287 ; Fax: +34 93 739 8101
> E-Mail: manel at labtie.mmt.upc.es
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list