[Beowulf] Re: Linux cluster for my college
Tony Travis
ajt at rri.sari.ac.uk
Mon Apr 16 03:52:56 PDT 2007
Geoff Jacobs wrote:
> Kyle Spaans wrote:
>> Wait, we can use openMOSIX and MPI at the same time? I thought they
>> were separate ideas? For example, MPI for multithreading and message
>> passing, and openMOSIX for just process migration. Can they be used at
>> the same time?
>>
>
> 1) Spawn MPI processes on the head node. Must be using tcp/ip for
> interconnect.
> 2) OpenMOSIX migrates processes to dormant computers.
> 3) Sit back and watch the data roll in.
Hello, Geoff.
Sounds nice doesn't it, but it doesn't scale up very well with large
numbers of MPI processors...
I run a small (64-node) openMosix (oM) Beowulf cluster running my own
port of the Linux-2.4.26-om1 kernel compiled under Ubuntu 6.06.1 LTS
(Ubuntu server edition supported until 2011 for any Ubuntu/Debian
detractors reading this):
http://www.ubuntu.com/getubuntu/download
We've tried different MPI implementations including MPICH, and LAM. The
problem is that, by default, the same nodes get hammered all the time
when MPI programs are run so these nodes have high system times because
there is a significant overhead involved when processes are migrated to
and from the 'home' nodes (the node where a process is started) to do
i/o. The reason is that oM has to use the kernel on the 'home' node to
do physical i/o. In fact, I run a patched version that prevents oM
migrating a process back home just to run the time() functions. This
makes a BIG difference to programs that monitor their own progress!
However, if you use a large number of MPI processors to run a job, you
would be better off using "Ganglia" to generate a load-balanced list of
MPI nodes first:
gstat -am
However, for small MPI jobs it's true that openMosix does automatically
add load-balancing to MPI. If your jobs are CPU intensive this works
very well, but if they do a lot of i/o the cluster spends a lot of its
time moving the active pages of processes between the kernels running on
different CPU's. BTW, openMosix migrates the active pages of the user
context, not the entire process. This is done very efficiently, but has
a finite cost because the COTS cluster interconnect is GigaBit ethernet.
The strategy we're developing now is to use all three approaches so that
MPI jobs are started on a "gstat" load-balanced list of processors and
oM will migrate MPI processes between nodes as the load changes if the
oM load-balancing algorithm sees a benefit. However, openMosix is best
for 'embarassingly' parallel tasks, which is the main reason we run it.
Re: the FC vs. ubuntu/Debian comments here recently, I'm one of those
people who ran RH7.3->RH9 for years longer than I should have because it
was both 'stable' and 'supported'. If it had not been for the excellent
FedoraLegacy project I would have migrated to Debian years ago. However,
the "end of life" statement for RH9 at FL is what forced me into action.
It was, indeed, an effort to upgrade to Ubuntu but I'm glad that I did.
Tony.
--
Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk
Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt
Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751
Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687
More information about the Beowulf
mailing list