While I agree with the idea and reasons of short job runtime limits, I 
disagree with your formulation. Being many times involved in 
discussions about what runtime limits should be set, I wouldn't make 
myself a statement like yours; I would say instead: YMMV. In other 
words: choose what fits better the job mix that users are actually 
running. If you have determined that 8h max. runtime is appropriate 
for _your_ cluster and increasing it to 24h would lead to a waste of 
computational time due to the reliability of _your_ cluster, then 
you've done your job well. But saying that everybody should use this 
limit is wrong.

Furthermore, although you mention that system-level checkpointing is 
associated with a performance hit, you seem to think that user-level 
checkpointing is a lot lighter, which is most often not the case. 
Apart from the obvious I/O limitations that could restrict saving & 
loading of checkpointing data, there are applications for which 
developers have chosen to not store certain data but recompute it 
every time it is needed because the effort of saving, storing & 
loading it is higher than the computational effort of recreating it - 
but this most likely means that for each restart of the application 
this data has to be recomputed. And smaller max. runtimes mean more 
restarts needed to reach the same total runtime...

