[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Art Poon artpoon at gmail.comTue Dec 1 12:45:52 PST 2009
- Previous message: [Beowulf] MPI Processes + Auto Vectorization
- Next message: [Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear colleagues, I am in charge of managing a cluster at our research centre and am stuck with a vexing (to me) problem! (Disclaimer: I am a biologist by training and a mostly self-taught programmer. I am still learning about networking and cluster management, so please bear with me!) This is an asymmetric Intel Xeon cluster running 4 compute nodes on CentOS 5.4 and Scyld Clusterware 5. We managed to get it up and running using a dinky little NetGear 5-port 10/100/1000 switch. Now that I'm looking to expand the cluster, I need to get the managed switch working (an SMC 8824M, though we have several other switches available). What's got me and the IT guys stumped is that while the compute nodes boot via PXE from the head node without trouble on the NetGear, they barf with the SMC. To be specific, after the initial boot with a minimal Linux kernel, there is a "fatal error" with "timeout waiting for getfile" when the compute node attempts to download the provisioning image from head. However, when they were running Rocks before I arrived, the cluster worked fine with the SMC switch. I've tried resetting the SMC switch to factory defaults (with auto-negotiate on). I've checked the /etc/beowulf/modprobe.conf and it doesn't seem to be demanding anything exotic. We've tried swapping out to another SMC switch but that didn't change anything. I'm grateful if you could weigh in with your expertise. Thank you, - Art.
- Previous message: [Beowulf] MPI Processes + Auto Vectorization
- Next message: [Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
