Scyld 27Z-8 Gig Net - HELP!

Art Edwards edwardsa at plk.af.mil
Thu Sep 26 12:20:25 PDT 2002


I'm running the public, free version of bz8. Is it true that we can, with
a little toil, make it work with GB ethernet? I'm bying 3com cards and
a Catalyst 4K switch.

Art Edwards

On Thu, Sep 26, 2002 at 11:35:50AM -0400, Karen Keadle-Calvert wrote:
> Stanley,
> 
> I know you said you modified all of the files, but just to review, under 
> 27z-8, you need to modify the file /etc/beowulf/config.boot to add the 
> device and vendor information for the newer e1000 card.  So you'll need 
> to add the following line:
> 
> pci     0x8086  0x100E  e1000
> 
> In addition, make sure you have a 'bootmodule' entry for "e1000" near 
> the beginning of the file.  Next rebuild your node boot floppy and 
> beoboot images and try rebooting.  
> 
> If you've already done all of that (which it sounds like you have), then 
> attached are some directions for building an e1000 driver under Scyld.  
> 
> Hopefully, this solves your problem.
> 
> Regards,
> 
> Karen
> 
> 
> Stanley, Matthew D. wrote:
> 
> >I have several clusters running the public release of 27Z-8.  They have 
> >been, up until now exclusively via-rhine and 3c59x based 100mbit clusters. 
> >We wanted to upgrade to gigabit ethernet and decided to upgrade our 4 
> >machine cluster using Dlink DGE-500T cards (ns820/ns83820 based).  I 
> >compiled the latest netdrivers.tgz file and the ns820 driver appeared to 
> >work fine as a link to the outside world but did not function on the 
> >beoboot floppy even though I compiled for that kernel and even did a full 
> >kernel set rebuild (rpm -bb) including the new netdrivers.tgz file.  What 
> >happened was right after it would find the card, find the master server 
> >and assign the IP address it would just sit at the line where it requests 
> >/var/beowulf/boot.img.
> >
> >Ok, so I gave up on Dlink cards, and purchased 4 Intel PRO/1000MT cards, 
> >the new version which requires the new release of drivers since it's PCI 
> >id is 8086:100E and not 8086:1000.  I again compiled the drivers and 
> >tested the card to the internet side with 0 problems.  I then create my 
> >boot images and try to boot, it gets a little farther than the Dlink, it 
> >will actually starts to boot the net boot image and then locks up and 
> >never completes.
> >
> >Am I missing something here?  Ive modified all of the files, it finds the 
> >cards, it even works for days on the internet if I switch my card to the 
> >eth0 and not eth1.  It appears to be a driver issue yet I have similar 
> >problems with two completely different sets of cards.  I have even tried 
> >using a 100 mbit hub instead of a gigabit switch with identical results.  
> >I can also just take out the cards and put in 3c59x cards and the problem 
> >is fixed!
> >
> >We use our clusters for NAMD only, is there a way to just install full 
> >versions of Scyld and then execute bpslave?  If so, what modifications 
> >need to be done to the node_up and other scripts to make that work.  I 
> >realize this means more administration, but at this point I have spent 
> >weeks trying to make this work, I can install and update 4 machines in a 
> >matter of a couple hours.
> >
> >Are there settings in beoboot which changes the way it gets the 
> >information from the master node, maybe making it more reliable like 
> >broadcast/multicast, etc?
> >
> >Any help would be appreciated,
> >
> >Matt Stanley
> >Systems Administrator
> >Structural Biology Core
> >University of Missouri - Columbia
> >_______________________________________________
> >Beowulf mailing list, Beowulf at beowulf.org
> >To change your subscription (digest mode or unsubscribe) visit 
> >http://www.beowulf.org/mailman/listinfo/beowulf
> > 
> >
> 

> HOW TO ADD DRIVERS - Example shown for Intel Pro/1000 series gigabit adapters
> ------------------
> 
> 
> => If available, get the prebuilt modules for the appropriate kernel from:
> ftp://www.scyld.com/pub/beowulf/<version>/updates
> 
> For example, for the 2.2.19-12 kernel:
> ftp://www.scyld.com/pub/beowulf/27z-8/updates/e1000-3.6.8.1.tar.gz
> 
> => If not available, download source code for driver.  The Intel Pro/1000 
> series driver can be found at ftp://www.intel.com/df-support/2897/eng or 
> http://downloadfinder.intel.com/scripts-df/Product_Filter.asp?ProductID=415 or
> http://support.intel.com/support/go/linux/e1000.htm
> 
> NOTE: If the kernel source rpm was not installed, you'll have to do that 
>       first.  It is installed by default under 27cz-9, but not under 
>       28cz-8-beta2. The kernel source is available on the distribution 
>       CD under Scyld/RPMS/kernel-source-2.4.9-21.1.i386.rpm
> 
>    => Add this line to the beginning of the Makefile
>    CFLAGS = $(KCFLAGS)
> 
>    => Make the beoboot, SMP, and UP modules for the version of the Scyld 
>    kernel that you are running under (27cz-9 shown here):
> 
>     > make KCFLAGS="-D__BOOT_KERNEL_H_ -D__module__beoboot"
>     > mv e1000.o /lib/modules/2.2.19-14.beobeoboot/net
>     > make KCFLAGS="-D__BOOT_KERNEL_H_ -D__BOOT_KERNEL_SMP=1"
>     > mv e1000.o /lib/modules/2.2.19-14.beosmp/net
>     > make KCFLAGS="-D__BOOT_KERNEL_H_ -D__BOOT_KERNEL_UP=1"
>     > mv e1000.o /lib/modules/2.2.19-14.beo/net
> 
> => Add new entries for this module to the PCI table 
> 
>  1. Add, if necessary, the following bootmodule entry to the configuration 
>     file (in /etc/beowulf/config.boot for 27cz-9 and /etc/beowulf/config for 
>     28cz-4):
> bootmodule e1000
> 
>  2. Add entries to the device list for each device supported by this driver 
>     (in /etc/beowulf/config.boot for 27cz-9 and /usr/share/kudzu/pcitable for
>     28cz-1):
> pci	0x8086	0x1000	e1000
> pci	0x8086	0x1001	e1000
> pci	0x8086	0x1004	e1000
> pci	0x8086	0x1008	e1000
> pci	0x8086	0x1009	e1000
> pci	0x8086	0x100c	e1000
>  
> => Build the dependency file (for each kernel) used by modprobe to load the 
>    correct module:
> 
> For single processor kernel:
> depmod -a -e -F /boot/System.map-2.2.19-14.beo 2.2.19-14.beo
> 
> For SMP (more than one processor machine) kernel:
> depmod -a -e -F /boot/System.map-2.2.19-14.beosmp 2.2.19-14.beosmp
> 
> For beoboot kernel (Stage 1 image):
> depmod -a -e -F /boot/System.map-2.2.19-14.beobeoboot 2.2.19-14.beobeoboot
> 
> 
> => Rebuild the Phase 1 and Phase 2 kernel images:
> /usr/bin/beoboot -1 -f -o /dev/fd0 -c "apm=power-off"
> /usr/bin/beoboot -2 -n -k /boot/vmlinuz-`uname -r` -o /var/beowulf/boot.img -c "apm=power-off"
> 
> 
> NOTE: 
> ----
> If your master node is single processor and your compute node is SMP, 
> and you don't have a SMP kernel installed, you'll have to get the RPM 
> from the distribution CD and install it (using rpm -U).  This happens 
> when you install on a single processor machine because the installer 
> selects the kernel to be installed based on the machine being installed 
> on.  You must run the same kernel on all of the machines in the cluster.  
> The SMP kernel can run on both single processor and SMP machines.
> 




More information about the Beowulf mailing list