[Beowulf] Xgrid and Mosix (fwd from john at rudd.cc)
Douglas Eadline, Cluster World Magazine
deadline at clusterworld.com
Sat Jan 1 07:40:44 PST 2005
These two pages are useful when considering Mosix
What does not migrate:
http://howto.x-tend.be/openMosixWiki/index.php/don't
what does migrate:
http://howto.x-tend.be/openMosixWiki/index.php/work%20smoothly
ClusterWorld just ran a feature on OpenMosix
http://www.clusterworld.com/issues/dec-04-preview.shtml
Doug
On Thu, 30 Dec 2004, Eugen Leitl wrote:
> ----- Forwarded message from John Rudd <john at rudd.cc> -----
>
> From: John Rudd <john at rudd.cc>
> Date: Wed, 29 Dec 2004 19:37:02 -0800
> To: xgrid-users at lists.apple.com
> Subject: Xgrid and Mosix
> X-Mailer: Apple Mail (2.619)
>
>
> I see in the archives that someone asked about OpenMosix back in
> September (
> http://lists.apple.com/archives/xgrid-users/2004/Sep/msg00023.html ),
> but I didn't see any responses. So I thought I'd ask too, but with a
> little more detail.
>
> The thing that I find interesting about the Mosix style distributed
> computing environment is that applications do NOT need to be re-written
> around them. Mosix abstracts the distributed computing cluster away
> from the program and developer in the same way that threads abstract
> multi-processing away from the program and developer. Under Mosix, any
> program, without having to be written around any special library,
> without having to be relinked or recompiled, can be moved off to
> another processing node if there are nodes that are significantly less
> busy than yours. And, AFAIK, any multi-threaded application can make
> use of multiple nodes (with threads being spawned on any host that is
> less loaded than the current node). Imagine taking a completely
> mundane but multi-threaded application (I'll assume Photoshop is
> multi-threaded and use that as an example). Suddenly, without having
> to get Adobe to support Xgrid, you can use Xgrid to speed up your
> Photoshop rendering.
>
> It seems to me that a similar set of features could be added to Xgrid.
> The threading and processing spawning code within the kernel could be
> extended by Xgrid to check for lightly loaded Agents, and move the new
> process or thread to that Agent. Only the IO routines would need to
> exist on the Client (and even then, maybe not: if every node has
> similar filesystem image, then only the UI (for user bound
> applications) or primary network interface code (for network
> daemons/servers) needs to run on the original Client system). From
> what I recall, the mach microkernel already makes some infrastructure
> for this type of thing available, it just needs to be utilized, and
> done deep enough in the kernel that an application doesn't need to know
> about it.
>
>
> Though, that does bring up one consideration: I have a friend who did a
> lot of distributed computing work when he was working for Flying
> Crocodile (a web hosting company that specialized in porn sites, where
> his distributed computing code had to support multiple-millions of hits
> per second). His experience there gave him a concern about Mosix style
> distributed computing. One of the advantages of something like Beowulf
> is that the coder often needs to control what things need to be kept
> low latency (must use threads for SMP on the local processor) and what
> things can have high latency (can use parallel code on the network),
> and the programming interface type of distributed computing gives them
> that flexibility.
>
> The idea that I suggested was something like nice/renice in unix, where
> you could specify certain parallelism parameters to a process before
> you run it, or after it is already running. For example, instead of
> "process priority", you might specify a sort of "process affinity" or
> "thread affinity". For process affinity, a low number (which means
> high affinity, just like priority and nice numbers) means "when this
> process creates a child, it must be kept close to the same CPU as the
> one that spawned it". Thread affinity would be the same, but for
> threads. A default of zero means "everything must run locally". A
> high number means "I can tolerate more latency" (so, "latency
> tolerance" would be the opposite of "affinity"). (it occurs to me
> after I wrote all of this that it might be easier for the end user to
> think in terms of "latency tolerance" instead of "process affinity",
> high latency = high number, instead of the opportunity for confusion
> that affinity has since the numbers go in the opposite direction ... I
> hope all of that made sense)
>
> A process with a low process affinity (high number) and a high thread
> affinity (low number) means that it can spawn new
> tasks/processes/applications anywhere in the network, but any threads
> for it (or its sub-processes) must exist on the same node as its main
> thread. Or, if you want all of the applications to be running on your
> workstation/Client, but run their threads all over the network, then
> you set a high process affinity (low number), and a low thread affinity
> (high number).
>
> I would have the xgrid command line tool have such a facility (I don't
> know if it does already or not, I haven't really done much with xgrid)
> similar to both the "nice" and "renice" commands. I would also add a
> preference pane that allows the user to set a default process affinity,
> a default thread affinity, and a list of applications and default
> affinities for each of those applications (so that they can be
> exceptions to the default, without the user having to set it via
> command line every time). Last, I would add a tool, possibly attached
> to the Xgrid tachometer, which would allow me to adjust an affinity
> after a program was running.
>
> The only thing up in the air is the ability to move a running thread
> from one node to another while it's running (well, during a context
> switch, really). I know a friend of mine at Ga Tech was doing PhD
> research on that (portable threads) 10ish years ago, but I don't know
> if it got anywhere. But, that would allow someone to lower the number
> of an application's affinity while it's running, thus recalling the
> threads or processes from a remote Agent to the local Client (the
> scenario being I have a laptop that is an Xgrid Client, and I start
> running applications that spread out across the network ... then I get
> up to leave, so I lower the affinity numbers of everything so that the
> tasks and threads come back to my laptop, running slower now that they
> have fewer nodes to run upon, but still running (or sleeping, as the
> case might be)).
>
>
> So ... all of that leads up to: does anyone know if Xgrid is working on
> this type of Application-Transparent Distributed Computing that Mosix,
> OpenMosix, and I think OpenSSI have? I think it would be a natural
> extension to Xgrid: Apple is trying to make this as "it just works" as
> possible, so it seems that it should not only be easy for the sysadmin
> to set up the distributed computing cluster, but easy/transparent for
> the developer, too (in the same way that threads made Multi-Processing
> easier and more abstract for the developer, this type of distributed
> computing makes threads not just a multi-processing model, but a
> distributed computing model). Ultimately, it even makes distributed
> computing easy for the user: they don't need to learn how to re-code a
> program (or coerce a vendor into making a distributed version of their
> application), any multi-threaded application will use multiple nodes,
> and even single-threaded non-distributed applications can be run on
> remote nodes. That seems like a powerful "it just works" capability to
> me.
>
> (the main drawback of Mosix, OpenMosix, and OpenSSI from my perspective
> is that they're Linux only, specifically developed for the Linux kernel
> ... but I'd really love to see something like them available for Mac OS
> X)
>
> Thoughts?
>
> _______________________________________________
> Do not post admin requests to the list. They will be ignored.
> Xgrid-users mailing list (Xgrid-users at lists.apple.com)
> Help/Unsubscribe/Update your Subscription:
> http://lists.apple.com/mailman/options/xgrid-users/eugen%40leitl.org
>
> This email sent to eugen at leitl.org
>
> ----- End forwarded message -----
>
--
----------------------------------------------------------------
Editor-in-chief ClusterWorld Magazine
Desk: 610.865.6061
Fax: 610.865.6618 www.clusterworld.com
More information about the Beowulf
mailing list