[Beowulf] Re: OT: PXE boot with no control over DHCP?
Donald Becker
becker at scyld.com
Thu Sep 22 13:48:12 PDT 2005
On Thu, 22 Sep 2005, Joe Landman wrote:
> In terms of figuring things out, run ethereal or tcpdump on the net
> to see if you are getting any DHCP packets. If the don't show up, then
> you have a problem. If you are running the ISC dhcp server, make sure
> it is listening on the right interface.
An implementation note: any BootP/DHCP/PXE server should bind itself to
a specific network interface i.e.
setsockopt(sockfd, SOL_SOCKET, SO_BINDTODEVICE,
interface_name, strlen(interface_name)+1)
The ports used by the DHCP part are
/* UDP port numbers for Bootp and DHCP */
#define DHCP_SERVER_PORT 67 /* Client to server request. */
#define DHCP_CLIENT_PORT 68 /* Server to client response. */
PXE implementations will also run TFTP on UDP port 69, and may also use
port 4011 for an intermediate agent and various multicast addresses.
(PXE servers should not offer multicast TFTP unless the client is
on the same physical network switch. Multicast is a dice roll in the best
case, and going through a router makes it unpredictably unreliable.)
> Usually one of the "smart" switches gets in the way of doing the
> right thing. If the switch is set to discard all multicast traffic, or
> plays games with spanning trees or other bits you might have a problem.
Spanning Tree Protocol ("STP", not to be confused with Scheduled Transfer
Protocol) is implemented by smart switches that want to prevent forwarding
loops. The problem is that it doesn't work with PXE clients, yet everyone
is following the specifications.
The scenerio is this:
- Switch vendor turns on STP by default to avoid accidental network loops
and the resulting packet storm
- STP does not forward traffic for 60 seconds after the port detects a
new link
- PXE client in the BIOS tells the UNDI NIC driver to initial the
network
- the UNDI NIC driver interprets this as meaning to reset the transceiver
and renegotiate the link
- re-autonegotiation looks like a cable disconnect, triggering STP
- the PXE client tries for 40 or 48 seconds to find a server
- the PXE client gives up just before the switch starts passing packets
The fix is to turn off STP on every port of every switch, and deal with
switch loops as they occur -- they won't be subtle.
--
Donald Becker becker at scyld.com
Scyld Software Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220 www.scyld.com
Annapolis MD 21403 410-990-9993
More information about the Beowulf
mailing list