[Beowulf] Which do you prefer local disk installed OS or NFS rooted?

Fri Jul 16 03:10:47 PDT 2004

Brent M. Clements wrote:

> Good Afternoon All:
> 
> Let me start by giving a little background.
> 
> Currently all of our clusters on campus have local disks which by using
> the systemimager suite of tools has an os image installed
> There are one or two clusters that are nfsrooted.
> 
> I'd like to know from all of you, which way is everyone leaning when it
> comes to clusters and os distribution.

Hello, Brent.

We've got a 32-node Athlon XP 2400+ cluster running "ClusterNFS" under 
openMosix 2.4.22-openmosix-2:

	http://clusternfs.sourceforge.net/
	http://openmosix.sourceforge.net/

The root partition of the 'head' node is exported read-only to the 
'diskless' compute nodes which have symbolic links to volatile files:

	/export/root/<IP ADDRESS>

All the 'diskless' nodes have a 40Gb local disk for:

	/dev/hda1 /var/tmp	# and /tmp -> /var/tmp
	/dev/hda2 swap

The system works well except for one problem: The 2Gb limit on filesize. 
This is an inevitable consequence of using clusterNFS, and I've not been 
able to do anything about it yet. Our system is based on 'BOBCAT':

	http://www.epcc.ed.ac.uk/bobcat/

Our 'BOBCAT' cluster architecture consists of a 'head' node with three 
NIC's running a ROOTNFS fileserver and 'diskless' nodes (strictly 
speaking 'dataless' nodes) with two NIC's on different ethernets, one 
for PXE/DHCP/NFS and the other for openMosix IPC:

PXE/DHCP/NFS		IPC			LAN
192.168.0.0		192.168.1.0		143.234.32.0

192.168.0.1	node1	192.168.1.1	mpe1	143.234.32.11	bobcat
192.168.0.2	node2	192.168.1.2	mpe2	143.234.32.12	topcat
192.168.0.3	node3	192.168.0.3	mpe3
...
192.168.0.32	node32	192.168.0.32	mpe32

On our system, node1 (bobcat) is the PXE/DHCP/NFS server and node2 
(topcat) is used for interactive logins. I've done quite a lot of 
network monitoring using tools like "iptraf", "ibmonitor" and "iftop".

It is something of a myth about 'high' NFS traffic on clusters using 
'diskless' compute nodes. It depends on what they are doing: If you're 
running computationally intensive jobs the NFS traffic is minimal once 
the programs are in the filesystem cache on the compute node. There is, 
of course, high NFS traffic when booting the 'diskless' nodes but we 
manage to boot 30 nodes in about four minutes without using a 3COM 
unmanaged 100Base-T 'private' ethernet switch (i.e not on the LAN).

One advantage of 'BOBCAT' architecture is the segregation of network 
traffic: You can still control the compute nodes no matter how much IPC 
traffic is going on between them because the IPC traffic is on a 
separate ethernet. Our system uses 192.168.0.0 for openMosix IPC, and 
192.168.1.0 for ssh initiation of MPI processes and sockets.

I've written scripts to add and remove 'diskless' nodes from the cluster:

	mknode	# add a node to the cluster
	rmnode	# remove a node from the cluster

Best wishes,

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mknode
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20040716/51f55408/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rmnode
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20040716/51f55408/attachment-0001.ksh>