[Beowulf] scheduler policy design

Tue Apr 24 08:52:52 PDT 2007

Wondering out loud.

I think what we have here is a clear cut need for "fast virtualization" 
(e.g. no/little performance penalty) which enables "fast" migration.

Current virtualization requires running in a VMware or similar type 
window.  This is heavyweight and slow, and it impedes access to fast low 
latency resources.

We have customers who like to run jobs for 10-100 days.  They consume 1 
or 2 cpus on the same node (running some commercial code that shall 
remain nameless).  Unfortunately, while these jobs are running, other 
short/fast jobs will not, cannot, run.

If we can assign a priority to the jobs, so that "short" jobs get a 
higher priority than longer jobs, and jobs priority decreases 
monotonically with run length, and we can safely checkpoint them, and 
migrate them (via a virtual container) to another node, or restart them 
on one node ... then we have something nice from a throughput view point.

The problem is that checkpointing (at least last I checked under linux) 
is not safe.  Virtualization has not been light weight (Xen, OpenVZ, ... 
still have significant impacts, and might not allow fast access to 
hardware).  Until we get this stuff, job schedulers play guessing games 
at best.

The way I described it to a customer who was achieving 90+% utilization 
of their machine, with long queues, a good job scheduler pisses everyone 
off equally.

Joe

-- 

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615