[Beowulf] IP address mapping for new cluster

Sun Jul 29 23:40:39 PDT 2007

Hi,

currently we are in the planning stage for a new cluster project and 
would like to receive some comments about IP address mapping.

The new cluster will probably consist of about 1000-2000 nodes 
distributed over about 60 racks. There will be compute nodes, two 
distinct classes of file servers, switches, IPMI interfaces and so on.

The scheme we have worked out so far looks like this:

(1) We will use a Class A(10.0.0.0/8) network and a flat broadcast range 
(10.255.255.255).

(2) Each IP address is described by 10.x.y.z with the following rules:

(2a) Second octet (10.x)

The first free octet will tell what kind of thing we are talking about, 
this can be an arbitrary number or a bit-mask:

Value 	Meaning
1 	Compute node
2 	Compute node IPMI
4 	control unit (rack power, colling unit)
8 	switch, if deemed necessary, core switches and edge switches can also 
be separated.
16 	file server
32 	data server
64 	head nodes

(2b) Third octet (10.x.y)

This octet tells where the object can be found. For that, the room's 
layout is mapped into a matrix. We have up to 15 racks per row and will 
have no more than 15 rows in total ,but the number of racks per row will 
not be constant. But still we could simply map the rack's position into 
y by: 16 * row + position within row

E.g. Rack #5 in row 2 would get the value of 37 (2*16+5).

(2c) Final octet (10.x.y.z)

The final octet can have many meanings:

* Compute node 	

nodes are counted from top to bottom with 1 being the top node.

* Compute node IPMI

same as above, i.e. there is a direct mapping between IPMI card and 
node. E.g. 10.1.37.4 has the IPMI address 10.2.37.4

* control unit

depending of the type of rack and number of needed addresses simply 
count them

* switch 	

switches are counted like compute nodes, i.e. from top to bottom 
starting at 1. 10.8.37.4 would thus be the fourth switch in rack 2.5

* file server

Same as compute nodes

* data server

Same as compute nodes

* head nodes

Same as compute nodes

(sorry about the length, but I think you get the idea).

What do you think, do the pros (relatively easy scheme, easy to locate a 
device by IP, objects addressable by netmasks,...) outweigh the cons 
(node mapping for looping over devices only by DNS since doing it by IP 
will cause headaches)?

We thought about other ways, e.g. putting all compute nodes in 10.1.x.y, 
or mapping the node number to IP address directly, e.g. n0123 -> 
10.1.101.123 but that way there have been nice rules and a few nasty 
exceptions.

What are you doing in your clusters?

Thanks in advance for any input

Greetings

Carsten