[Beowulf] Timeout on eth0 when booting disk less
griznog at gmail.com
Sun Mar 12 04:19:43 PDT 2017
Do you have "spanning-tree portfast" enabled on your switch ports? The
syntax will depend on switch vendor, but the principal is the same. When
the nic comes up there can be a significant delay while STP decides whether
to pass packets or not, which can wreak havoc on things that expect the
network to Just Work(tm). portfast tells the switch to assume a client, not
another switch is on the port.
On Sun, Mar 12, 2017 at 1:02 PM Jon Tegner <tegner at renget.se> wrote:
> I'm booting some nodes disk less, using root file system on NFS. The
> first phase (PXE, tftp etc) works without problems. The second phase,
> when the system is actually supposed to boot over NFS the machine hangs
> about every second time - and it seems to be a result of the relevant
> nic (eth0) not negotiating its properties before some kind of timeout
> kicks in. Obviously resulting in a failure to boot the NFS file system.
> If I reboot the node it eventually works.
> This could possibly be a problem with the switch, but one remedy would
> seem to be to extend the time before the timeout kicks in (it seems to
> be a few seconds at the moment). Any hints on how to achieve this?
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf