job migration

Robert G. Brown rgb at phy.duke.edu
Thu Mar 8 15:20:37 PST 2001


On Fri, 9 Mar 2001, Cosmik Debris wrote:

> I don't think you can do process migration with scripts. For process
> migration one has to be able to "pick up" a running job and move it along
> with it's open file handles etc. Not an easy task.

OTOH, if you're willing to write the jobs and scripts as a tuned pair,
you can e.g. kill -USR1 pid from the script, trap the signal in the
executable and e.g. write out a restartable checkpoint to an NFS shared
file, and the rerun the command on a lightly loaded remote host with a
flag that causes it to restart the process from the checkpoint file.  Or
you can get more sophisticated and have the process itself (upon receipt
of the kill signal) start another copy of itself on a remote host, open
a socket connection, transmit its current state and a begin command, and
die.  There are always ways to do it.

Whether one SHOULD do it rather than install MOSIX (which was born and
bred to make this specific task utterly painless) is a question of how
checkpointable your task is -- how easily it can write out a restartable
state and restart from that state.  If you are e.g. doing Monte Carlo
and each thread just generates independent samples without any initial
thermalization time, you may need NO initial data to migrate a task --
just kill one on the loaded host with a signal that forces it to write
out any unflushed samples first and start a new one on the unloaded
host.

Then there are tasks with open sockets, with many megabytes of internal
state data, with open files, that would be a nightmare to checkpoint
restartably.  Somewhere in between is a point of no (sane) return,
especially with clear upper bounds on the work required to install
MOSIX.  Only you know if your job is pretty darn easy to migrate in this
way would it be worth it.  I've used this sort of method a few times
about six or seven years ago so I know it works, but it is a bit clunky.

   rgb

>
> > -----Original Message-----
> > From: Christoph Wasshuber [mailto:wasshub at ti.com]
> > Sent: Friday, 9 March 2001 3:02 a.m.
> > To: Beowulf (E-mail)
> > Subject: job migration
> >
> >
> > Is it possible to do a crude job migration with some
> > simple scripts? Or do I need to get one of
> > the job migration packages like MOSIX, Scyld, ...
> >
> > chris....
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe)
> > visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu







More information about the Beowulf mailing list