scyld on an ASUS A7V266-C

Donald Becker becker at scyld.com
Wed Jun 19 13:15:20 PDT 2002


On Wed, 19 Jun 2002, Jorge M. Pacheco wrote:

> We just bought a couple of AMD's installed on ASUS A7V266-C mainboards
> to add more nodes to our scyld beowulf cluster
> (distro=bz-27-8, running kernel 2.2.19-12.beo).

That's just slight old, but it should be very stable.

> This cluster has been running 24h/day for 6 months without a single
> problem, so we are very very happy indeed.

Thanks for reporting that.
Usually only problems are reported.  Rarely do we hear "it works fine"..

> However, in trying to add the first node, we got quite a few
> disturbing messages.

Unless it's a prefix to "but now we changed something and..."

> Namely, we boot from the usual floppy boot-disk, and we get the
> following messages:
> 3Com 3cSOHO100-Tx Hurricane (good news - it knows the NIC, with a driver from scyld - what else do we want?)
> perf: CPU unsupported - counting disabled...

Not a serious problem.  That release had our performance counter
library.  Few people used it in products, so it it's not in the current
releases.

> neighbour table overflow

This is a all-purpose message that the kernel puts out when something
goes wrong with the network.

My first guess is that the driver must be updated for the errata in your
NIC.  The driver should have detected the Ethernet transceiver at MII
address #24.  If it found the transceiver at #0, the driver must be
updated to a version that ignores the false detection.

> These 2 are bad news, I am afraid. Yet, the computer mumbles for a
> while and then it gives node-up in the beosetup

Ohhh, well that means that the network is working, or at least mostly
working.  Perhaps a few packets were dropped.

> window. But then things are a bit weird. For instance:
> CPU - N/A
> MEM - 0%
> SWAP - NONE
> DISC - 0%
> NETWORK - 0 KBps

This likely means that there is a problem with multicast packets.  The
beostatus library uses multicast to report node information, and the
beostatus program uses the library to find the current node state.  If
multicast packets don't get through, you don't see the node status.

> this zero for the network is weird, since ALL other nodes are not zero.
> But, besides this info gatherered from beoatatus, the main node
> disappeared completely from the beostatus information.
> Gone. Now counting starts on 0, and not on -1.

Hmmm, did you change any configuration information?

> And what about my Xterm ? It went crazy. I type whatever and it says:
> too many open files in system
> Can't meka pipes for command substitution.

Ahhh, you are out of file descriptors.  Find the process in /proc/* with
a few thousand open...

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993




More information about the Beowulf mailing list