[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK

Chris Samuel csamuel at vpac.org
Wed Dec 2 18:35:01 PST 2009


----- "Art Poon" <artpoon at gmail.com> wrote:

> To be specific, after the initial boot with a
> minimal Linux kernel, there is a "fatal error"
> with "timeout waiting for getfile" when the
> compute node attempts to download the provisioning
> image from head.

I've seen similar issues with Cisco switches in IBM
Cluster 1350 systems where the switch was in its default
configuration.

The fix was to configure each port pointing to a
compute node as an "edge port" to suppress the
switches instinct to (IIRC) try and snoop for
spanning tree information when bringing the port
up as that meant that the vital packets were
being dropped.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency



More information about the Beowulf mailing list