Scyld 27Z-8 Gig Net - HELP!

Karen Keadle-Calvert calvert at scyld.com
Thu Sep 26 08:35:50 PDT 2002


Stanley,

I know you said you modified all of the files, but just to review, under 
27z-8, you need to modify the file /etc/beowulf/config.boot to add the 
device and vendor information for the newer e1000 card.  So you'll need 
to add the following line:

pci     0x8086  0x100E  e1000

In addition, make sure you have a 'bootmodule' entry for "e1000" near 
the beginning of the file.  Next rebuild your node boot floppy and 
beoboot images and try rebooting.  

If you've already done all of that (which it sounds like you have), then 
attached are some directions for building an e1000 driver under Scyld.  

Hopefully, this solves your problem.

Regards,

Karen


Stanley, Matthew D. wrote:

>I have several clusters running the public release of 27Z-8.  They have been, up until now exclusively via-rhine and 3c59x based 100mbit clusters.  We wanted to upgrade to gigabit ethernet and decided to upgrade our 4 machine cluster using Dlink DGE-500T cards (ns820/ns83820 based).  I compiled the latest netdrivers.tgz file and the ns820 driver appeared to work fine as a link to the outside world but did not function on the beoboot floppy even though I compiled for that kernel and even did a full kernel set rebuild (rpm -bb) including the new netdrivers.tgz file.  What happened was right after it would find the card, find the master server and assign the IP address it would just sit at the line where it requests /var/beowulf/boot.img.
>
>Ok, so I gave up on Dlink cards, and purchased 4 Intel PRO/1000MT cards, the new version which requires the new release of drivers since it's PCI id is 8086:100E and not 8086:1000.  I again compiled the drivers and tested the card to the internet side with 0 problems.  I then create my boot images and try to boot, it gets a little farther than the Dlink, it will actually starts to boot the net boot image and then locks up and never completes.
>
>Am I missing something here?  Ive modified all of the files, it finds the cards, it even works for days on the internet if I switch my card to the eth0 and not eth1.  It appears to be a driver issue yet I have similar problems with two completely different sets of cards.  I have even tried using a 100 mbit hub instead of a gigabit switch with identical results.  I can also just take out the cards and put in 3c59x cards and the problem is fixed!
>
>We use our clusters for NAMD only, is there a way to just install full versions of Scyld and then execute bpslave?  If so, what modifications need to be done to the node_up and other scripts to make that work.  I realize this means more administration, but at this point I have spent weeks trying to make this work, I can install and update 4 machines in a matter of a couple hours.
>
>Are there settings in beoboot which changes the way it gets the information from the master node, maybe making it more reliable like broadcast/multicast, etc?
>
>Any help would be appreciated,
>
>Matt Stanley
>Systems Administrator
>Structural Biology Core
>University of Missouri - Columbia
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>  
>

-------------- next part --------------
HOW TO ADD DRIVERS - Example shown for Intel Pro/1000 series gigabit adapters
------------------


=> If available, get the prebuilt modules for the appropriate kernel from:
ftp://www.scyld.com/pub/beowulf/<version>/updates

For example, for the 2.2.19-12 kernel:
ftp://www.scyld.com/pub/beowulf/27z-8/updates/e1000-3.6.8.1.tar.gz

=> If not available, download source code for driver.  The Intel Pro/1000 
series driver can be found at ftp://www.intel.com/df-support/2897/eng or 
http://downloadfinder.intel.com/scripts-df/Product_Filter.asp?ProductID=415 or
http://support.intel.com/support/go/linux/e1000.htm

NOTE: If the kernel source rpm was not installed, you'll have to do that 
      first.  It is installed by default under 27cz-9, but not under 
      28cz-8-beta2. The kernel source is available on the distribution 
      CD under Scyld/RPMS/kernel-source-2.4.9-21.1.i386.rpm

   => Add this line to the beginning of the Makefile
   CFLAGS = $(KCFLAGS)

   => Make the beoboot, SMP, and UP modules for the version of the Scyld 
   kernel that you are running under (27cz-9 shown here):

    > make KCFLAGS="-D__BOOT_KERNEL_H_ -D__module__beoboot"
    > mv e1000.o /lib/modules/2.2.19-14.beobeoboot/net
    > make KCFLAGS="-D__BOOT_KERNEL_H_ -D__BOOT_KERNEL_SMP=1"
    > mv e1000.o /lib/modules/2.2.19-14.beosmp/net
    > make KCFLAGS="-D__BOOT_KERNEL_H_ -D__BOOT_KERNEL_UP=1"
    > mv e1000.o /lib/modules/2.2.19-14.beo/net

=> Add new entries for this module to the PCI table 

 1. Add, if necessary, the following bootmodule entry to the configuration 
    file (in /etc/beowulf/config.boot for 27cz-9 and /etc/beowulf/config for 
    28cz-4):
bootmodule e1000

 2. Add entries to the device list for each device supported by this driver 
    (in /etc/beowulf/config.boot for 27cz-9 and /usr/share/kudzu/pcitable for
    28cz-1):
pci	0x8086	0x1000	e1000
pci	0x8086	0x1001	e1000
pci	0x8086	0x1004	e1000
pci	0x8086	0x1008	e1000
pci	0x8086	0x1009	e1000
pci	0x8086	0x100c	e1000
 
=> Build the dependency file (for each kernel) used by modprobe to load the 
   correct module:

For single processor kernel:
depmod -a -e -F /boot/System.map-2.2.19-14.beo 2.2.19-14.beo

For SMP (more than one processor machine) kernel:
depmod -a -e -F /boot/System.map-2.2.19-14.beosmp 2.2.19-14.beosmp

For beoboot kernel (Stage 1 image):
depmod -a -e -F /boot/System.map-2.2.19-14.beobeoboot 2.2.19-14.beobeoboot


=> Rebuild the Phase 1 and Phase 2 kernel images:
/usr/bin/beoboot -1 -f -o /dev/fd0 -c "apm=power-off"
/usr/bin/beoboot -2 -n -k /boot/vmlinuz-`uname -r` -o /var/beowulf/boot.img -c "apm=power-off"


NOTE: 
----
If your master node is single processor and your compute node is SMP, 
and you don't have a SMP kernel installed, you'll have to get the RPM 
from the distribution CD and install it (using rpm -U).  This happens 
when you install on a single processor machine because the installer 
selects the kernel to be installed based on the machine being installed 
on.  You must run the same kernel on all of the machines in the cluster.  
The SMP kernel can run on both single processor and SMP machines.



More information about the Beowulf mailing list