Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Carpenter, Dean Dean.Carpenter at pharma.comTue May 8 13:30:38 PDT 2001
- Previous message: mpi-mandel
- Next message: Running FDTD (Finite Difference Time Domain) with beowulf
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Heh. I'm baaaack. Got more weirdness. The cluster is working. That's the good news. How I got there is odd though ... Note in the bottom of this msg that booting from a normal node boot diskette would pull the 2.2.19 kernel from the master fine, but after the 2-kernel monte, it would black screen and cold boot. I created a stage 2 boot floppy with beoboot, using the *same* 2.2.19 kernel. beoboot -2 -f -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19/ THAT sucker boots those eval nodes fine. So, floppy boot of stage 2 works like a champ, while the 2-kernel monte boot cold boots it. Riddle me that one Batman. Oh, there is a slight problem, but it doesn't appear to be affecting anything (NFS works fine). The last lines in the node boot are ... portmap: RPC call returned error 5 portmap: RPC call returned error 5 lockd_up: makesock failed, error = -5 portmap: RPC call returned error 5 2nd weirdness. I also have a few Dell PowerEdge 2450 boxes here that have been in the test cluster since day one. They have all worked fine with the 2.2.17-33.beosmp kernel. They boot off the normal node floppy, monte works fine, and all is copasetic. Well, ever since moving the master to 2.2.19, those floppies won't boot *any* node. Not the new evals (cold boot) nor the 2450's (also keep rebooting). Now why would a stage 2 kernel change affect that I wonder ? Tomorrow I'll recreate the stage1 boot floppies, just in case. Also will build a tighter kernel, just including stuff we need for the various node types. Then I'm out for a week ... -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter at pharma.com deano at areyes.com 94TT :) -----Original Message----- From: Carpenter, Dean [mailto:Dean.Carpenter at pharma.com] Sent: Tuesday, May 08, 2001 2:36 PM To: beowulf at beowulf.org Subject: RE: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard Huh - interesting. I just rebuilt a netboot image using the UP 2.2.17 from Scyld ... beoboot -2 -n -k /boot/vmlinuz-2.2.17-33.beo -m /lib/modules/2.2.17-33.beo/ Rebooted a compute node. It comes up in UP as expected, but no NFS. Checking the /var/log/beowulf/node.0 file, it was trying to load modules (sunrpc specifically) from /lib/modules/2.2.19/misc. Now the master node is running 2.2.19. But why would the compute node try to load 2.2.19 modules ? I thought the beoboot script build a boot.img file that contains the kernel and modules ... Have to scan through beoboot ... -- Dean Carpenter Principal Architect Purdue Pharma dean.carpenter at pharma.com deano at areyes.com 94TT :) -----Original Message----- From: Carpenter, Dean [mailto:Dean.Carpenter at pharma.com] Sent: Tuesday, May 08, 2001 2:12 PM To: beowulf at beowulf.org Cc: 'David Vos' Subject: RE: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard OK. Progress, but not in the right direction :) Here's what I did, and I'll be detailed so hopefully someone will notice what I missed/typoed/screwedup ... Got 2.2.19 from kernel.org, grabbed the bproc-2.2.tar.bz2 from Scyld. Patched the kernel source - took a little tweaking, some things had changed. But it appears to have gone in OK. make menuconfig Turn all sorts of things, most unnecessary, but there to more or less match up what the 2.2.17 menuconfig said. make dep make -j 4 bzImage make -j 4 modules make modules_install mv arch/i386/boot/bzImage /boot/vmlinuz-2.2.19 Copied the /boot/initrd-2.2.17-33.beosmp.img to /tmp/initrd-2.2.19.img.gz , gunzipped it, mounted it on /mnt. Replaced the aic7xxx.o with the 2.2.19 version. That was the only module being loaded for the master node. mount -o loop initrd-2.2.19.img /mnt cp /lib/modules/2.2.19/scsi/aic7xxx.o /mnt/lib umount /mnt gzip -9 /tmp/initrd-2.2.19.img mv /tmp/initrd-2.2.19.img.gz /boot/initrd-2.2.19.img Added the 2.2.19 kernel and initrd to /etc/lilo.conf, and rebooted. bproc failures - not installed yet, but that was expected. Now running 2.2.19 on the master node. Built bproc stuff. That seemed to go OK as well. The INSTALL file didn't quite seem to match the actual though. make make install Modules loaded cleanly. Nice. Copied the modules to the right place. cp vmadump/vmadump.o /lib/modules/2.2.19/misc cp ksyscall/ksyscall.o /lib/modules/2.2.19/misc cp bproc/bproc.o /lib/modules/2.2.19/misc Rebooted to see that they load during the boot. Works fine. Nice. So now the master node is running 2.2.19 patched with bproc, and appears to be fine. Time to build a netboot stage 2 image. beoboot -d -2 -n -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1 Check the debug output. Looks good, it grabbed 2.2.19 kernel and the right modules. OK, boot one of the new eval nodes - everything seems to go OK, but only seems to. As the stage 2 kernel boots, the screen goes black for about 10 seconds, then it coldboots. Dang it. Redid the netboot image with noapic just in case ... beoboot -d -2 -n -c noapic -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1 No go. Same thing. Dang it :( My next step is to build a 2.2.19 kernel with only what's needed for the master and compute nodes. Although not completely homogenous, it will be pretty close. Another option is to try the latest Alan Cox 2.2.19 ... Hmmm. I think I'll grab that first - more chance of Via chipset fixes in there. These eval nodes came with Redhat 7.1 base install with 2.4.x kernel. That comes up fine in SMP mode, so that's another (albeit more painful) option. How hard is it to patch bproc etc into 2.4.x ? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: mpi-mandel
- Next message: Running FDTD (Finite Difference Time Domain) with beowulf
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
