[Beowulf] Re: Beowulf Digest, Vol 70, Issue 4
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Jeff Johnson jeff.johnson at aeoncomputing.comWed Dec 2 10:34:20 PST 2009
- Previous message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Next message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 12/2/09 10:21 AM, beowulf-request at beowulf.org wrote: > ------------------------------ > > Message: 8 > Date: Tue, 1 Dec 2009 12:45:52 -0800 > From: Art Poon<artpoon at gmail.com> > Subject: [Beowulf] Re: cluster fails to boot with managed switch, but > 5-port switch works OK > To:beowulf at beowulf.org > Message-ID:<825EEAB3-C58F-46B8-A9C4-A806C5B682D3 at gmail.com> > Content-Type: text/plain; charset=us-ascii > > Dear colleagues, > > [snip] > > What's got me and the IT guys stumped is that while the compute nodes boot via PXE from the head node without trouble on the NetGear, they barf with the SMC. To be specific, after the initial boot with a minimal Linux kernel, there is a "fatal error" with "timeout waiting for getfile" when the compute node attempts to download the provisioning image from head. However, when they were running Rocks before I arrived, the cluster worked fine with the SMC switch. > > I've tried resetting the SMC switch to factory defaults (with auto-negotiate on). I've checked the /etc/beowulf/modprobe.conf and it doesn't seem to be demanding anything exotic. We've tried swapping out to another SMC switch but that didn't change anything. > > I'm grateful if you could weigh in with your expertise. > I don't know if my $.02 here could be classified as 'expertise'. With that disclaimer out of the way I can say that SMC switches do have a tendency to have very old firmware when they are stocked in warehouses and they are not often updated. Their update process is a PITA compared to other switches out there. I have seen cases where their old firmware and STP (spanning tree protocol) causes enough delay when a port comes up on the switch for the first time in a pxe/dhcp operation that the process times out while the switch is trying to figure out if there are network loops. The firmware update can be obtained from www.smc.com and is at v2.3.0.0 updated in March. Check your switch to see where you are at now. The Netgear switches are layer-2 and too dumb to cause problems. > Thank you, > - Art. > > > > > ------------------------------ > > -- ------------------------------ Jeff Johnson Manager Aeon Computing jeff.johnson at aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 f: 858-412-3845 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
- Previous message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Next message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
