job migration
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduThu Mar 8 15:20:37 PST 2001
- Previous message: job migration
- Next message: Question about BProc process migration/ Scyld.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 9 Mar 2001, Cosmik Debris wrote: > I don't think you can do process migration with scripts. For process > migration one has to be able to "pick up" a running job and move it along > with it's open file handles etc. Not an easy task. OTOH, if you're willing to write the jobs and scripts as a tuned pair, you can e.g. kill -USR1 pid from the script, trap the signal in the executable and e.g. write out a restartable checkpoint to an NFS shared file, and the rerun the command on a lightly loaded remote host with a flag that causes it to restart the process from the checkpoint file. Or you can get more sophisticated and have the process itself (upon receipt of the kill signal) start another copy of itself on a remote host, open a socket connection, transmit its current state and a begin command, and die. There are always ways to do it. Whether one SHOULD do it rather than install MOSIX (which was born and bred to make this specific task utterly painless) is a question of how checkpointable your task is -- how easily it can write out a restartable state and restart from that state. If you are e.g. doing Monte Carlo and each thread just generates independent samples without any initial thermalization time, you may need NO initial data to migrate a task -- just kill one on the loaded host with a signal that forces it to write out any unflushed samples first and start a new one on the unloaded host. Then there are tasks with open sockets, with many megabytes of internal state data, with open files, that would be a nightmare to checkpoint restartably. Somewhere in between is a point of no (sane) return, especially with clear upper bounds on the work required to install MOSIX. Only you know if your job is pretty darn easy to migrate in this way would it be worth it. I've used this sort of method a few times about six or seven years ago so I know it works, but it is a bit clunky. rgb > > > -----Original Message----- > > From: Christoph Wasshuber [mailto:wasshub at ti.com] > > Sent: Friday, 9 March 2001 3:02 a.m. > > To: Beowulf (E-mail) > > Subject: job migration > > > > > > Is it possible to do a crude job migration with some > > simple scripts? Or do I need to get one of > > the job migration packages like MOSIX, Scyld, ... > > > > chris.... > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) > > visit http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: job migration
- Next message: Question about BProc process migration/ Scyld.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
