Scyld 27Z-8 Gig Net - HELP!
Stanley, Matthew D.
bcmatt at missouri.edu
Wed Sep 25 13:39:52 PDT 2002
I have several clusters running the public release of 27Z-8. They have been, up until now exclusively via-rhine and 3c59x based 100mbit clusters. We wanted to upgrade to gigabit ethernet and decided to upgrade our 4 machine cluster using Dlink DGE-500T cards (ns820/ns83820 based). I compiled the latest netdrivers.tgz file and the ns820 driver appeared to work fine as a link to the outside world but did not function on the beoboot floppy even though I compiled for that kernel and even did a full kernel set rebuild (rpm -bb) including the new netdrivers.tgz file. What happened was right after it would find the card, find the master server and assign the IP address it would just sit at the line where it requests /var/beowulf/boot.img.
Ok, so I gave up on Dlink cards, and purchased 4 Intel PRO/1000MT cards, the new version which requires the new release of drivers since it's PCI id is 8086:100E and not 8086:1000. I again compiled the drivers and tested the card to the internet side with 0 problems. I then create my boot images and try to boot, it gets a little farther than the Dlink, it will actually starts to boot the net boot image and then locks up and never completes.
Am I missing something here? Ive modified all of the files, it finds the cards, it even works for days on the internet if I switch my card to the eth0 and not eth1. It appears to be a driver issue yet I have similar problems with two completely different sets of cards. I have even tried using a 100 mbit hub instead of a gigabit switch with identical results. I can also just take out the cards and put in 3c59x cards and the problem is fixed!
We use our clusters for NAMD only, is there a way to just install full versions of Scyld and then execute bpslave? If so, what modifications need to be done to the node_up and other scripts to make that work. I realize this means more administration, but at this point I have spent weeks trying to make this work, I can install and update 4 machines in a matter of a couple hours.
Are there settings in beoboot which changes the way it gets the information from the master node, maybe making it more reliable like broadcast/multicast, etc?
Any help would be appreciated,
Structural Biology Core
University of Missouri - Columbia
More information about the Beowulf