[Beowulf] Mosix

Tue Jan 18 17:25:12 PST 2005

On Tue, 2005-01-18 at 15:29, Michael Will wrote:
> On Tuesday 18 January 2005 11:31 am, Rajesh Bhairampally wrote:
> > i am wondering when we have something like mosix (distributed OS available
> > at www.mosix.org ), why we should still develop parallel programs and
> > strugle with PVM/MPI etc. 
> 
> Because Mosix does not work?
> 
> This of course is not really true, for some applications Mosix might be appropriate,
> but what it really does is transparently move processes around in a cluster, not
> have them become suddenly parallelized. 
> 
> Let's have an example:
> 
> Generally your application is solving a certain problem, like say taking an image and apply
> a certain filter to it. You can write a program for it that is not parallel-aware, and does not use
> MPI and just solves the problem of creating one filtered image from one original image.
> 
> This serial program might take one hour to run (assuming really large image and really 
> complicated filter). 
> 
> Mosix can help you now run this on a cluster with 4 nodes, which is cool if you have 4
> images and still want to wait 1 hour until you see the first result.
> 
> Now if you want to really filter only one image, but in about 15 minutes, you can program your
> application differently so that it only works on a quarter of the image. Mosix could still help you
> run your code with different input data in your cluster, but then you have to collect the four pieces
> and stitch them together and would be unpleasently surprised because the borders of the filter
> will show - there was information missing because you did not have the full image available but just
> a quarter of it. Now when you adjust your code to exchange that border-information, you are actually
> already on the path to become an MPI programmer, and might as well just run it on a beowulf cluster.
> 
> So the mpi aware solution to this would be a program that splits up the image into the four quadrants, 
> forks into four pieces that will be placed on four available nodes, communicates the border-data between
> the pieces and finally collects the result and writes it out as one final image, all in not much more than 
> the 15 minutes expected.
> 
> Thats why you want to learn how to do real parallel programming instead of relying on some transparent
> mechanism to guess how to solve your problem.
> 
> Michael
> 
> 

Ignoring the inflammatory opening of the above response, I'll just state
that its representation of what Mosix does and how it works is neither
fair nor accurate.

Before message-passing mechanisms arrived, and before the concept of
multi-threading was introduced, the favored mechanism for
multi-processing and parallelism was the good old fork-join method. That
is, a parent process divided the task into small, manageable sub-tasks
and then forked child processes off to handle each subtask. When the
subtask was complete, the child notified the parent (usually by simply
exiting) and the parent joined the results of the sub-tasks into the
final task result. This mechanism works quite well on multi-tasking
operating systems with various scheduling models. It can be effective on
multi-CPU single systems or on clusters of single or multiple CPU
systems.

Mosix (or at least Open Mosix) handles this kind of parallelism
brilliantly in that it will balance the forked child processes around
the cluster based on load factors. So your image processing, your
Gaussian signal analysis, your fluid dynamics simulations, your parallel
software compilations, or your Fibonacci number generations are
efficiently distributed while you still maintain programmatic control of
the sub-tasking.

While the fork-join mechanism is not without a downside
(synchronization, for one, as mentioned above), it can be used with a
system like Mosix to provide parallelism without the overhead of the
message-passing paradigm. Maybe not better, probably not worse, just
different.

The effect described above in which sub-tasks operate completely
independently to produce an erroneous result is really an artifact of
poor programming and design skills and cannot be blamed on the task
distribution system. Mosix is used regularly to do image processing and
other highly parallel tasks. Creating a system like this for Mosix
requires no knowledge of a message-passing interface or API, but simply
requires a working knowledge of standard multi-processing methods and
parallelism in general.

One final note: most people consider a Mosix cluster to be a Beowulf as
long as it meets the requirements of using commodity hardware and
readily available software.

Just keeping the record straight.

> 
> > Tough i never used either mosix or PVM/MPI, I am 
> > genunely puzzled about it. Can someone kindly educate me?
> > 
> > thanks,
> > rajesh
> > 
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > 
-- 
Steve Brenneis <sbrenneis at surry.net>