[Beowulf] Re: cluster fails to boot with managed switch, but 5-port switch works OK
Greg at keller.net
Fri Dec 4 14:36:00 PST 2009
On Dec 4, 2009, at 3:23 PM, Bogdan Costescu wrote:
>> When loading/reloading the driver there seems to be an
>> instantaneous drop
>> of the link that forces a new delay cycle.
> Most likely the PXE stack doesn't reset the link; the link is up soon
> after the computer is powered on so, by the time the POST has
> finished, the link is active. Again most likely, the Linux driver does
> a link reset as part of the initialization; I remember that the 3c59x
> driver was changed ~6years ago to not do this anymore (at Don Becker's
> suggestion, IIRC) and it would allow the established link to remain
> active, making DHCP succeed all the time.
That's true for some ports. Most IPMI (duplex'd) ports seem to come
up at 100Mb and then switch to 1Gb at some point in the Post.
Normally PXE seems to work but later in the boot it fails to get a
DHCP the address, so I suspect you are correct for many cases where
the System brings up the 1Gb Link early in the post before the PXE.
I like the fix you mention where 3com based cards don't reset the
link. Most Lan On Motherboards seem to be Broadcom or Intel e1000
based in my world... but it would be kuel if "they" figured out the
same magic for those drivers. Ultimately I think it's a workaround
for overly cautious defaults on switches, but some times it's easier
to drive around the pothole than fix it.
More information about the Beowulf