[Beowulf] Re: Time limits in queues

Thu Jan 17 08:43:19 PST 2008

Bogdan Costescu wrote:
> On Wed, 16 Jan 2008, Craig Tierney wrote:
> 
>> Our queue limits are 8 hours.
>> ...
>> Did that sysadmin who set 24 hour time limits ever analyze the amount
>> of lost computational time because of larger time limits?
> 
> While I agree with the idea and reasons of short job runtime limits, I 
> disagree with your formulation. Being many times involved in discussions 
> about what runtime limits should be set, I wouldn't make myself a 
> statement like yours; I would say instead: YMMV. In other words: choose 
> what fits better the job mix that users are actually running. If you 
> have determined that 8h max. runtime is appropriate for _your_ cluster 
> and increasing it to 24h would lead to a waste of computational time due 
> to the reliability of _your_ cluster, then you've done your job well. 
> But saying that everybody should use this limit is wrong.

First all I agree that it is always a YMMV case.  We good about that here (the list).
My point was, that in every instance that I have seen, multi-day queue
limits are not the norm.  Those places do have exceptions for particular codes
and particular projects.    I know our system would handle 24h queues in terms
of reliability, but with the job mix we have, it would cause problems beyond stability
(we are currently looking at a new scheduler to solve that problem).

> 
> Furthermore, although you mention that system-level checkpointing is 
> associated with a performance hit, you seem to think that user-level 
> checkpointing is a lot lighter, which is most often not the case. 

There was an assumption in my statement that I didn't share with people.
I was thinking about system-level checkpointing that will probably work
for clusters which will be some sort of VM based solution.  That will
have the overhead of the virtual machine as well as moving the data when
the time comes.

> Apart 
> from the obvious I/O limitations that could restrict saving & loading of 
> checkpointing data, there are applications for which developers have 
> chosen to not store certain data but recompute it every time it is 
> needed because the effort of saving, storing & loading it is higher than 
> the computational effort of recreating it - but this most likely means 
> that for each restart of the application this data has to be recomputed. 

Yes, but didn't you just say the recomputing that data are faster than the
IO time associated with reading it?  A checkpoint isn't model results.  A checkpoint
is a state of the model at a particular time, so in this case you would save
that data.  Its already in memory, you just need to write it out with every
other bit of relevant information.  No extra needed computations.

> And smaller max. runtimes mean more restarts needed to reach the same 
> total runtime...
> 

Yes, anytime you are doing something other than the model run (like checkpointing)
your run will take longer.   This is another one of those "it depends" scenario.
If the runtime takes 1% longer, and it makes the other users happier or lessens
the loss due to an eventual crash, is it worth it?

The 1% number is a target I would design for, based on the workload we experience
(multitude of different sized jobs, not one big job).  I would buy a couple of nodes with 3ware
cards and run either Lustre or PVFS2 over it for a place to dump the checkpoints.  The
filesystem would be mostly volatile (so redundancy wouldn't be critical), and would
more than meet the reliability requirements of my system (>97%).

Craig

-- 
Craig Tierney (craig.tierney at noaa.gov)