Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard
Carpenter, Dean
Dean.Carpenter at pharma.com
Tue May 8 13:30:38 PDT 2001
Heh. I'm baaaack. Got more weirdness.
The cluster is working. That's the good news. How I got there is odd
though ... Note in the bottom of this msg that booting from a normal node
boot diskette would pull the 2.2.19 kernel from the master fine, but after
the 2-kernel monte, it would black screen and cold boot.
I created a stage 2 boot floppy with beoboot, using the *same* 2.2.19
kernel.
beoboot -2 -f -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19/
THAT sucker boots those eval nodes fine. So, floppy boot of stage 2 works
like a champ, while the 2-kernel monte boot cold boots it. Riddle me that
one Batman. Oh, there is a slight problem, but it doesn't appear to be
affecting anything (NFS works fine). The last lines in the node boot are
...
portmap: RPC call returned error 5
portmap: RPC call returned error 5
lockd_up: makesock failed, error = -5
portmap: RPC call returned error 5
2nd weirdness. I also have a few Dell PowerEdge 2450 boxes here that have
been in the test cluster since day one. They have all worked fine with the
2.2.17-33.beosmp kernel. They boot off the normal node floppy, monte works
fine, and all is copasetic.
Well, ever since moving the master to 2.2.19, those floppies won't boot
*any* node. Not the new evals (cold boot) nor the 2450's (also keep
rebooting). Now why would a stage 2 kernel change affect that I wonder ?
Tomorrow I'll recreate the stage1 boot floppies, just in case. Also will
build a tighter kernel, just including stuff we need for the various node
types. Then I'm out for a week ...
--
Dean Carpenter
Principal Architect
Purdue Pharma
dean.carpenter at pharma.com
deano at areyes.com
94TT :)
-----Original Message-----
From: Carpenter, Dean [mailto:Dean.Carpenter at pharma.com]
Sent: Tuesday, May 08, 2001 2:36 PM
To: beowulf at beowulf.org
Subject: RE: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard
Huh - interesting. I just rebuilt a netboot image using the UP 2.2.17 from
Scyld ...
beoboot -2 -n -k /boot/vmlinuz-2.2.17-33.beo -m
/lib/modules/2.2.17-33.beo/
Rebooted a compute node. It comes up in UP as expected, but no NFS.
Checking the /var/log/beowulf/node.0 file, it was trying to load modules
(sunrpc specifically) from /lib/modules/2.2.19/misc.
Now the master node is running 2.2.19. But why would the compute node try
to load 2.2.19 modules ? I thought the beoboot script build a boot.img file
that contains the kernel and modules ...
Have to scan through beoboot ...
--
Dean Carpenter
Principal Architect
Purdue Pharma
dean.carpenter at pharma.com
deano at areyes.com
94TT :)
-----Original Message-----
From: Carpenter, Dean [mailto:Dean.Carpenter at pharma.com]
Sent: Tuesday, May 08, 2001 2:12 PM
To: beowulf at beowulf.org
Cc: 'David Vos'
Subject: RE: Scyld Beowulf doesn't like Gigabyte GA-6vxdr7 motherboard
OK. Progress, but not in the right direction :) Here's what I did, and
I'll be detailed so hopefully someone will notice what I
missed/typoed/screwedup ...
Got 2.2.19 from kernel.org, grabbed the bproc-2.2.tar.bz2 from Scyld.
Patched the kernel source - took a little tweaking, some things had changed.
But it appears to have gone in OK.
make menuconfig
Turn all sorts of things, most unnecessary, but there to more or less match
up what the 2.2.17 menuconfig said.
make dep
make -j 4 bzImage
make -j 4 modules
make modules_install
mv arch/i386/boot/bzImage /boot/vmlinuz-2.2.19
Copied the /boot/initrd-2.2.17-33.beosmp.img to /tmp/initrd-2.2.19.img.gz ,
gunzipped it, mounted it on /mnt. Replaced the aic7xxx.o with the 2.2.19
version. That was the only module being loaded for the master node.
mount -o loop initrd-2.2.19.img /mnt
cp /lib/modules/2.2.19/scsi/aic7xxx.o /mnt/lib
umount /mnt
gzip -9 /tmp/initrd-2.2.19.img
mv /tmp/initrd-2.2.19.img.gz /boot/initrd-2.2.19.img
Added the 2.2.19 kernel and initrd to /etc/lilo.conf, and rebooted. bproc
failures - not installed yet, but that was expected.
Now running 2.2.19 on the master node. Built bproc stuff. That seemed to
go OK as well. The INSTALL file didn't quite seem to match the actual
though.
make
make install
Modules loaded cleanly. Nice. Copied the modules to the right place.
cp vmadump/vmadump.o /lib/modules/2.2.19/misc
cp ksyscall/ksyscall.o /lib/modules/2.2.19/misc
cp bproc/bproc.o /lib/modules/2.2.19/misc
Rebooted to see that they load during the boot. Works fine. Nice. So now
the master node is running 2.2.19 patched with bproc, and appears to be
fine. Time to build a netboot stage 2 image.
beoboot -d -2 -n -k /boot/vmlinuz-2.2.19 -m /lib/modules/2.2.19 >
/tmp/beoboot.txt 2>&1
Check the debug output. Looks good, it grabbed 2.2.19 kernel and the right
modules. OK, boot one of the new eval nodes - everything seems to go OK,
but only seems to. As the stage 2 kernel boots, the screen goes black for
about 10 seconds, then it coldboots. Dang it. Redid the netboot image with
noapic just in case ...
beoboot -d -2 -n -c noapic -k /boot/vmlinuz-2.2.19 -m
/lib/modules/2.2.19 > /tmp/beoboot.txt 2>&1
No go. Same thing. Dang it :(
My next step is to build a 2.2.19 kernel with only what's needed for the
master and compute nodes. Although not completely homogenous, it will be
pretty close. Another option is to try the latest Alan Cox 2.2.19 ...
Hmmm. I think I'll grab that first - more chance of Via chipset fixes in
there.
These eval nodes came with Redhat 7.1 base install with 2.4.x kernel. That
comes up fine in SMP mode, so that's another (albeit more painful) option.
How hard is it to patch bproc etc into 2.4.x ?
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list