[Beowulf] using two separate networks for different data streams

Tony Travis ajt at rri.sari.ac.uk
Fri Jan 27 10:47:18 PST 2006


Douglas Eadline wrote:
>[...]
> Indeed, an excellent question. It seems logical, does it really help though
> (or do I just feel clever about using the extra Ethernet Port)  I can see
> that if you have a lot of monitoring traffic that might cause an issue,
> but I have never tested that notion as well. Of course it all depends...
> I wonder if a dual Ethernet node would be better served by something like
> a FNN (http://aggregate.org/FNN/) Tim Mattox can probably weigh in on
> this.

Hello, Doug.

One of the first systems I saw that used a dual ethernet, as opposed to 
just channel bonding multiple NIC's, was the EPCC BOBCAT:

	http://www.epcc.ed.ac.uk/bobcat/

Although this system has now been dismantled, it inspired me to build a 
similar cluster here at the Rowett:

	http://bobcat.rri.sari.ac.uk

The most important feature of a 'BOBCAT' architecture Beowulf is the use 
of 'diskless' compute nodes with separate dual network fabrics for the 
'system' and 'application' traffic. The 'diskless' nodes are really 
'dataless' because they have scratch disks for /tmp and swap, but no 
operating system installed.

This approach is useful because it means that you can still control the 
Beowulf cluster via the 'system' network even if the 'application' 
network becomes staturated. The traffic is segregated between the two 
private network fabrics.

In fact, the system I built here has three NIC's in the servers and uses 
NAT on the head node to allow compute nodes to make outgoing connections 
to the internet from the private cluster network via the LAN so that, 
for example, our folding at home jobs on the nodes can download work units.

This system works very well and, incidentally, demonstrates that poor 
perfomance of 'diskless' compute nodes with NFS-mounted root filesystems 
might have more to do with saturation of the cluster interconnection by 
HPC 'application' traffic than NFS congestion on a 64-node cluster. I'm 
aware that NFS does not scale up very well to large clusters: No flames!

Our cluster has three networks:

143.234.32.0	LAN		100Base-T	(public)
192.168.0.0	System		100Base-T	(private)
192.168.1.0	Application	Gigabit		(private)

The compute nodes have two NIC's connected to the private network. The 
servers have three NIC's connected to the private networks and the LAN.

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687



More information about the Beowulf mailing list