[Beowulf] IP address mapping for new cluster

Robert G. Brown rgb at phy.duke.edu
Tue Aug 7 06:30:21 PDT 2007


On Mon, 6 Aug 2007, Carsten Aulbert wrote:

> Hi Larry, (sorry for the late reply)
>
> first of all thank you very much for the feedback!
>
> Larry Stewart wrote:
>> I was going to say "how often do you really deal with the A.B.C.D rather 
>> than DNS names anyway?" but I've
>> just spent a couple of weeks doing just that and it really is convenient 
>> when you are in the weeds.
>
> That was our thought as well, thus the "idea".
>
>> 
>> One comment is that nearly all software that deals with dotted quads prints 
>> in decimal, which makes
>> binary encodings of the meaning awkward.  So using 4 bit fields for the X 
>> and Y coordinates is hard
>> to translate in your head.  Instead, making the third octet be 
>> (row*20)+column would be a lot easier
>> on the brain and supports 12 rows.  This is why we do things like 
>> A.B.200+<module ID>.100+<node ID>/18.
>> It's a little awkward to get started, but then it is trivial to map in your 
>> brain from IP to function
>> and position.
>> 
>
> Right now the current plan allows up to 10 rows, thus 20 seems to be a good 
> number here as well :)
>
>> The next issue is how all this gets initialized.  Pretty much the only way 
>> to do it is to have the DHCP
>> servers configured to map MAC addresses to IP addresses in a stable way. 
>> We don't really have that
>> problem because pretty much the only interfaces that have random MAC 
>> addresses are the module
>> service processors.  The MAC address maps to the manufacturing serial 
>> number, which is essential
>> for tracking faults, but the position (slot ID/module ID) is reported in 
>> the DHCP request in a <vendor>
>> field and the DHCP server knows what to do.
>> 
>> It seems like when you install something, you will have to enter its MAC 
>> addresses into the DHCP
>> server database and map to a stable IP address given database knowlege of 
>> the position and function
>> of the device.
>
> Yes, we will require our vendor to hand over a list (text file) of all MAC 
> addresses of the cluster, i.e. two on board NICs plus MAC from IPMI card.
>
>> For us, there were a number of benefits in going to "IP address maps to 
>> function": * Humans can debug given the IP addresses alone
>> * No DNS lookups required in performance critical paths
>> * Higher level configuration files for things like SLURM can be nearly 
>> static
>> 
>
> So far so good.
>
>> Nevertheless, is the benefit of mapping IP to physical location really 
>> valuable?  Trying to
>> maintain this given the probable frequency of swapping out boxes will cause 
>> trouble with
>> DHCP and ARP.  Either you make the leases short and wait for them to expire 
>> before
>> powering on a replacement, or you have to go around manually flushing 
>> leases and arp
>> tables.  Ugh.  Instead, it may make more sense to give a type of device a 
>> stable IP address
>> without regard to position, and to maintain a database mapping MAC/IP to 
>> location
>> separately.  For a few 1000's of devices, grepping the location file will 
>> be faster than
>> walking over to the right rack anyway.  We have this problem with modules. 
>> The service
>> guys want to swap modules in the backplane to see if a problem follows it 
>> and it has
>> cost us some DHCP hackery to let the addressing respond smoothly.
>
> So far our experience with slightly smaller clusters suggest that the DHCP 
> problem *might* occur, but usually we have a few "spare nodes" which are 
> switched off during regular operations (at least officially ;)). If a node 
> dies and is send back for service we will simply leave the "hole" on the rack 
> and switch on the spare node at its position - again at least officially. 
> After the box returns we can simply reinstall it back in its own place. Thus 
> lease times should thus not be an issue.
>
> So far it seems we will have enough spare room to house all real and spare 
> nodes, thus it should not be a problem (keeping my fingers crossed).
>
> Anyone else seeing a big problem in this idea?
>
> Cheers
>
> Carsten
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list