scyld on an ASUS A7V266-C
Jorge M. Pacheco
pacheco at cii.fc.ul.pt
Wed Jun 19 11:22:22 PDT 2002
We just bought a couple of AMD's installed on ASUS A7V266-C mainboards to add more nodes to our scyld beowulf cluster
(distro=bz-27-8, running kernel 2.2.19-12.beo).
This cluster has been running 24h/day for 6 months without a single problem, so we are very very happy indeed.
However, in trying to add the first node, we got quite a few disturbing messages.
Namely, we boot from the usual floppy boot-disk, and we get the following messages:
3Com 3cSOHO100-Tx Hurricane (good news - it knows the NIC, with a driver from scyld - what else do we want?)
perf: CPU unsupported - counting disabled...
neighbour table overflow
These 2 are bad news, I am afraid. Yet, the computer mumbles for a while and then it gives node-up in the beosetup
window. But then things are a bit weird. For instance:
CPU - N/A
MEM - 0%
SWAP - NONE
DISC - 0%
NETWORK - 0 KBps
this zero for the network is weird, since ALL other nodes are not zero.
But, besides this info gatherered from beoatatus, the main node disappeared completely from the beostatus information.
Gone. Now counting starts on 0, and not on -1.
And what about my Xterm ? It went crazy. I type whatever and it says:
too many open files in system
Can't meka pipes for command substitution.
Nice, right ?
But can you figure what the solution is ? Just kill beosetup and beostatus window and you are back to normal.
Now, what happens if I try to submit a job on the new node ? After all, it says it is up.
So, I started a process there. The beostatus (just restarted) says nothing is happenig there.
But the program is there, still active, until I get the message:
timeout connection for node X.
Does anyone have a clue of what is going on ?
I would very much appreciate.
Greetings, J. M. Pacheco
More information about the Beowulf