[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK

Greg Keller Greg at keller.net
Fri Dec 4 14:36:00 PST 2009


On Dec 4, 2009, at 3:23 PM, Bogdan Costescu wrote:

>> When loading/reloading the driver there seems to be an  
>> instantaneous drop
>> of the link that forces a new delay cycle.
> Most likely the PXE stack doesn't reset the link; the link is up soon
> after the computer is powered on so, by the time the POST has
> finished, the link is active. Again most likely, the Linux driver does
> a link reset as part of the initialization; I remember that the 3c59x
> driver was changed ~6years ago to not do this anymore (at Don Becker's
> suggestion, IIRC) and it would allow the established link to remain
> active, making DHCP succeed all the time.

That's true for some ports.  Most IPMI (duplex'd) ports seem to come  
up at 100Mb and then switch to 1Gb at some point in the Post.   
Normally PXE seems to work but later in the boot it fails to get a  
DHCP the address, so I suspect you are correct for many cases where  
the System brings up the 1Gb Link early in the post before the PXE.

I like the fix you mention where 3com based cards don't reset the  
link.  Most Lan On Motherboards seem to be Broadcom or Intel e1000  
based in my world...  but it would be kuel if "they" figured out the  
same magic for those drivers.  Ultimately I think it's a workaround  
for overly cautious defaults on switches, but some times it's easier  
to drive around the pothole than fix it.

Cheers!
Greg



More information about the Beowulf mailing list