[Beowulf] Experience of using multiple network devices on a node in cluster

Tony Travis ajt at rri.sari.ac.uk
Mon May 16 08:48:32 PDT 2005


Mark Hahn wrote:
>>We have implemented clusters using one interface for parallel traffic
>>(Score) and one for general purpose/NFS traffic.
> 
> 
> segregating traffic is a common suggestion, but I don't really understand
> why it would be sensible.  a node is unlikley to be running some mixture
> of MPI and IO jobs, at least the normal kind of node (dual).
> control/monitoring really ought to be minimal in bandwidth (per-node), no?

Hello, Mark.

I used a single network fabric at first, which relied on the switches to 
segregate the network traffic: We have a 64-node diskless Beowulf 
cluster which is based on the EPCC 'BOBCAT' model. You are right that 
the control/monitoring bandwidth is minimal, but we are using openMosix 
to load-balance and the i/o can be very high as processes are migrated.

I think it is essential to throttle the bandwidth used by oM process 
migration: In fact, we initially ran MPI on the 'NFS' network and left 
the full bandwidth of the second fabric for oM process migration. When 
we were useing a single network fabiric and the cluster was busy we had 
problems with NFS timeouts and it was difficult to control the cluster.

Using two network fabrics has eliminated the problem completely...

Best wishes,

	Tony.
-- 
Dr. A.J.Travis,                     |  mailto:ajt at rri.sari.ac.uk
Rowett Research Institute,          |    http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn,          |   phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK.    |     fax:+44 (0)1224 716687



More information about the Beowulf mailing list