[Beowulf] IP address mapping for new cluster

Carsten Aulbert carsten.aulbert at aei.mpg.de
Mon Aug 6 00:14:28 PDT 2007


Hi Larry, (sorry for the late reply)

first of all thank you very much for the feedback!

Larry Stewart wrote:
> I was going to say "how often do you really deal with the A.B.C.D rather 
> than DNS names anyway?" but I've
> just spent a couple of weeks doing just that and it really is convenient 
> when you are in the weeds.

That was our thought as well, thus the "idea".

> 
> One comment is that nearly all software that deals with dotted quads 
> prints in decimal, which makes
> binary encodings of the meaning awkward.  So using 4 bit fields for the 
> X and Y coordinates is hard
> to translate in your head.  Instead, making the third octet be 
> (row*20)+column would be a lot easier
> on the brain and supports 12 rows.  This is why we do things like 
> A.B.200+<module ID>.100+<node ID>/18.
> It's a little awkward to get started, but then it is trivial to map in 
> your brain from IP to function
> and position.
> 

Right now the current plan allows up to 10 rows, thus 20 seems to be a 
good number here as well :)

> The next issue is how all this gets initialized.  Pretty much the only 
> way to do it is to have the DHCP
> servers configured to map MAC addresses to IP addresses in a stable 
> way.  We don't really have that
> problem because pretty much the only interfaces that have random MAC 
> addresses are the module
> service processors.  The MAC address maps to the manufacturing serial 
> number, which is essential
> for tracking faults, but the position (slot ID/module ID) is reported in 
> the DHCP request in a <vendor>
> field and the DHCP server knows what to do.
> 
> It seems like when you install something, you will have to enter its MAC 
> addresses into the DHCP
> server database and map to a stable IP address given database knowlege 
> of the position and function
> of the device.

Yes, we will require our vendor to hand over a list (text file) of all 
MAC addresses of the cluster, i.e. two on board NICs plus MAC from IPMI 
card.

> For us, there were a number of benefits in going to "IP address maps to 
> function": * Humans can debug given the IP addresses alone
> * No DNS lookups required in performance critical paths
> * Higher level configuration files for things like SLURM can be nearly 
> static
> 

So far so good.

> Nevertheless, is the benefit of mapping IP to physical location really 
> valuable?  Trying to
> maintain this given the probable frequency of swapping out boxes will 
> cause trouble with
> DHCP and ARP.  Either you make the leases short and wait for them to 
> expire before
> powering on a replacement, or you have to go around manually flushing 
> leases and arp
> tables.  Ugh.  Instead, it may make more sense to give a type of device 
> a stable IP address
> without regard to position, and to maintain a database mapping MAC/IP to 
> location
> separately.  For a few 1000's of devices, grepping the location file 
> will be faster than
> walking over to the right rack anyway.  We have this problem with 
> modules.  The service
> guys want to swap modules in the backplane to see if a problem follows 
> it and it has
> cost us some DHCP hackery to let the addressing respond smoothly.

So far our experience with slightly smaller clusters suggest that the 
DHCP problem *might* occur, but usually we have a few "spare nodes" 
which are switched off during regular operations (at least officially 
;)). If a node dies and is send back for service we will simply leave 
the "hole" on the rack and switch on the spare node at its position - 
again at least officially. After the box returns we can simply reinstall 
it back in its own place. Thus lease times should thus not be an issue.

So far it seems we will have enough spare room to house all real and 
spare nodes, thus it should not be a problem (keeping my fingers crossed).

Anyone else seeing a big problem in this idea?

Cheers

Carsten



More information about the Beowulf mailing list