[Beowulf] Xgrid and Mosix (fwd from john at rudd.cc)

Eugen Leitl eugen at leitl.org
Thu Dec 30 04:20:59 PST 2004


----- Forwarded message from John Rudd <john at rudd.cc> -----

From: John Rudd <john at rudd.cc>
Date: Wed, 29 Dec 2004 19:37:02 -0800
To: xgrid-users at lists.apple.com
Subject: Xgrid and Mosix
X-Mailer: Apple Mail (2.619)


I see in the archives that someone asked about OpenMosix back in 
September ( 
http://lists.apple.com/archives/xgrid-users/2004/Sep/msg00023.html ), 
but I didn't see any responses.  So I thought I'd ask too, but with a 
little more detail.

The thing that I find interesting about the Mosix style distributed 
computing environment is that applications do NOT need to be re-written 
around them.  Mosix abstracts the distributed computing cluster away 
from the program and developer in the same way that threads abstract 
multi-processing away from the program and developer.  Under Mosix, any 
program, without having to be written around any special library, 
without having to be relinked or recompiled, can be moved off to 
another processing node if there are nodes that are significantly less 
busy than yours.  And, AFAIK, any multi-threaded application can make 
use of multiple nodes (with threads being spawned on any host that is 
less loaded than the current node).  Imagine taking a completely 
mundane but multi-threaded application (I'll assume Photoshop is 
multi-threaded and use that as an example).  Suddenly, without having 
to get Adobe to support Xgrid, you can use Xgrid to speed up your 
Photoshop rendering.

It seems to me that a similar set of features could be added to Xgrid.  
The threading and processing spawning code within the kernel could be 
extended by Xgrid to check for lightly loaded Agents, and move the new 
process or thread to that Agent.  Only the IO routines would need to 
exist on the Client (and even then, maybe not: if every node has 
similar filesystem image, then only the UI (for user bound 
applications) or primary network interface code (for network 
daemons/servers) needs to run on the original Client system).  From 
what I recall, the mach microkernel already makes some infrastructure 
for this type of thing available, it just needs to be utilized, and 
done deep enough in the kernel that an application doesn't need to know 
about it.


Though, that does bring up one consideration: I have a friend who did a 
lot of distributed computing work when he was working for Flying 
Crocodile (a web hosting company that specialized in porn sites, where 
his distributed computing code had to support multiple-millions of hits 
per second).  His experience there gave him a concern about Mosix style 
distributed computing.  One of the advantages of something like Beowulf 
is that the coder often needs to control what things need to be kept 
low latency (must use threads for SMP on the local processor) and what 
things can have high latency (can use parallel code on the network), 
and the programming interface type of distributed computing gives them 
that flexibility.

The idea that I suggested was something like nice/renice in unix, where 
you could specify certain parallelism parameters to a process before 
you run it, or after it is already running.  For example, instead of 
"process priority", you might specify a sort of "process affinity" or 
"thread affinity".  For process affinity, a low number (which means 
high affinity, just like priority and nice numbers) means "when this 
process creates a child, it must be kept close to the same CPU as the 
one that spawned it".  Thread affinity would be the same, but for 
threads.  A default of zero means "everything must run locally".  A 
high number means "I can tolerate more latency" (so, "latency 
tolerance" would be the opposite of "affinity").  (it occurs to me 
after I wrote all of this that it might be easier for the end user to 
think in terms of "latency tolerance" instead of "process affinity", 
high latency = high number, instead of the opportunity for confusion 
that affinity has since the numbers go in the opposite direction ... I 
hope all of that made sense)

A process with a low process affinity (high number) and a high thread 
affinity (low number) means that it can spawn new 
tasks/processes/applications anywhere in the network, but any threads 
for it (or its sub-processes) must exist on the same node as its main 
thread.  Or, if you want all of the applications to be running on your 
workstation/Client, but run their threads all over the network, then 
you set a high process affinity (low number), and a low thread affinity 
(high number).

I would have the xgrid command line tool have such a facility (I don't 
know if it does already or not, I haven't really done much with xgrid) 
similar to both the "nice" and "renice" commands.  I would also add a 
preference pane that allows the user to set a default process affinity, 
a default thread affinity, and a list of applications and default 
affinities for each of those applications (so that they can be 
exceptions to the default, without the user having to set it via 
command line every time).  Last, I would add a tool, possibly attached 
to the Xgrid tachometer, which would allow me to adjust an affinity 
after a program was running.

The only thing up in the air is the ability to move a running thread 
from one node to another while it's running (well, during a context 
switch, really).  I know a friend of mine at Ga Tech was doing PhD 
research on that (portable threads) 10ish years ago, but I don't know 
if it got anywhere.  But, that would allow someone to lower the number 
of an application's affinity while it's running, thus recalling the 
threads or processes from a remote Agent to the local Client (the 
scenario being I have a laptop that is an Xgrid Client, and I start 
running applications that spread out across the network ... then I get 
up to leave, so I lower the affinity numbers of everything so that the 
tasks and threads come back to my laptop, running slower now that they 
have fewer nodes to run upon, but still running (or sleeping, as the 
case might be)).


So ... all of that leads up to: does anyone know if Xgrid is working on 
this type of Application-Transparent Distributed Computing that Mosix, 
OpenMosix, and I think OpenSSI have?  I think it would be a natural 
extension to Xgrid: Apple is trying to make this as "it just works" as 
possible, so it seems that it should not only be easy for the sysadmin 
to set up the distributed computing cluster, but easy/transparent for 
the developer, too (in the same way that threads made Multi-Processing 
easier and more abstract for the developer, this type of distributed 
computing makes threads not just a multi-processing model, but a 
distributed computing model).  Ultimately, it even makes distributed 
computing easy for the user: they don't need to learn how to re-code a 
program (or coerce a vendor into making a distributed version of their 
application), any multi-threaded application will use multiple nodes, 
and even single-threaded non-distributed applications can be run on 
remote nodes.  That seems like a powerful "it just works" capability to 
me.

(the main drawback of Mosix, OpenMosix, and OpenSSI from my perspective 
is that they're Linux only, specifically developed for the Linux kernel 
... but I'd really love to see something like them available for Mac OS 
X)

Thoughts?

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Xgrid-users mailing list      (Xgrid-users at lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/xgrid-users/eugen%40leitl.org

This email sent to eugen at leitl.org

----- End forwarded message -----
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a>
______________________________________________________________
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20041230/ed307a9e/attachment.sig>


More information about the Beowulf mailing list