Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Greg Keller Greg at keller.net
Thu Dec 3 12:17:56 PST 2009


>>>> What's got me and the IT guys stumped is that while the compute  
>>>> nodes
>>> boot via PXE from the head node without trouble on the NetGear, they
>>> barf with the SMC.  To be specific, after the initial boot with a
>>> minimal Linux kernel, there is a "fatal error" with "timeout  
>>> waiting for
>>> getfile" when the compute node attempts to download the provisioning
>>> image from head.  However, when they were running Rocks before I
>>> arrived, the cluster worked fine with the SMC switch.


This is very common with Spanning tree enabled.  Essentially, once the  
port has a physical link light it may take a while before spanning  
tree allows traffic to actually flow through the port.  Longer than a  
typical timeout.  When loading/reloading the driver there seems to be  
an instantaneous drop of the link that forces a new delay cycle.

With the Dell PowerConnect (SMC Rebrand??) series you have to "enable"  
port fast or "disable" spanning tree to avoid this delay before  
traffic passes.  I generally do both.  The Web based GUI is  
sufficiently bad enough to make this more difficult than it needs to  
be, but you can globally disable spanning tree through it.  I use the  
command line, connect to interface range all, and then configure my  
ports as:

!
enable
config
interface range ethernet all
spanning-tree disable
spanning-tree portfast
mtu 9216
exit
!

Hope this helps!

Cheers!
Greg

Technical Principal
R Systems NA, inc.








More information about the Beowulf mailing list