WOL: how does it work?
Martin Siegert
siegert at sfu.ca
Fri Sep 7 12:20:48 PDT 2001
I am trying to get wake-on-lan (WOL) to wirk on my Beowulf cluster
and until now I failed. Admittedly I don't know much about WOL, thus
this failure may just be due to some stupid mistake on my part.
Here is the problem: Each node draws a current of about 1.5A (I measured
that a few days ago). Since I have about 70 of those, booting all nodes
at once will draw all of the sudden a current of more than 100A. The
people who run our machine room don't allow me to do that (probably for
good reason). Thus I decided on the following approach:
The bios for the motherboard that I'm using (Tyan Thunder K7) allows
two setting for what to do after a power failure when the power comes
back on: a) stay off or b) power on.
Instead of choosing b) for all nodes (wich would cause the aforementioned
problem) I want to choose b) only for the master node and a) for all slaves.
Then use WOL from the master node to wake up the slave sequentially
using a script and the ether-wake program from
http://www.scyld.com/expert/wake-on-lan.html.
Unfortunately, I have been unable to wake up a node. Here is what I do:
"halt" a node. Detach the power cable. Reattach the power cable.
At this point the lights on the two onboard NICs (the Tyan web site
and the printing on the chips say that those are 3c920, the 3c59x driver
identifies them as 3c980; I don't know whether that is relevant; the NICs
work fine) come on. A Tyan technician told me that WOL on the Thunder K7 is
always on, no special BIOS setup would be needed. They also told me that I
have to use a 2.4.x kernel because only those would support APCI. I don't
understand why the kernel is important here: when the node is halted
what difference does the kernel make for the receiving of the magic WOL
packet that is supposed to wake up the box? Anyway, I compiled a
2.4.9-ac8 kernel with APIC enabled, which I use with the "noapic"
kernel option in /etc/lilo.conf. I have also tried the stock RH 7.1
2.4.3-12smp kernel without any difference with respect to WOL (i.e.,
no success).
After reattaching the power cable I then send the magic packet from
the master node:
./ether-wake -i eth4 00:E0:81:03:21:DD
where 00:E0:81:03:21:DD is the MAC address of one of the onboard NICs
on the node. tcpdump shows that the packet actually is sent. Also the
lights in the NICs on the sending and receiving end flash, but otherwise
nothing happens.
What's wrong? Any suggestions are most appreciated.
Thanks!
Martin
========================================================================
Martin Siegert
Academic Computing Services phone: (604) 291-4691
Simon Fraser University fax: (604) 291-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
========================================================================
More information about the Beowulf
mailing list