[Beowulf] [EXTERNAL] Re: Interactive vs batch, and schedulers

Fri Jan 17 08:09:58 PST 2020

The problem with timeslicing is that when one job is pre-empted, it's 
state needs to be stored somewhere so the next job can run. Since many 
HPC jobs are memory intensive, using RAM for this is not usually an 
option. Which leaves writing the state to disk. Since disk is many 
orders of magnitude slower than RAM, writing state to disk for 
timeslicing would ultimately reduce the throughput of the cluster. It's 
much more efficient to have one job "own" the nodes until it completes.

Yes, jobs do checkpointing, but I'm assuming the checkpointing isn't 
happening as frequently as your proposed timeslicing, and that 
checkpointing isn't writing the entire state to disk.

Prentice

On 1/17/20 12:35 AM, Lux, Jim (US 337K) via Beowulf wrote:
>
> And I suppose there’s no equivalent of “timeslicing” where the cores 
> run job A for 99% of the time and job B, C, D, E, F, for 1% of the time.
>
> *From: *Alex Chekholko <alex at calicolabs.com>
> *Date: *Thursday, January 16, 2020 at 3:50 PM
> *To: *Jim Lux <james.p.lux at jpl.nasa.gov>
> *Cc: *"beowulf at beowulf.org" <beowulf at beowulf.org>
> *Subject: *[EXTERNAL] Re: [Beowulf] Interactive vs batch, and schedulers
>
> Hey Jim,
>
> There is an inverse relationship between latency and throughput.  Most 
> supercomputing centers aim to keep their overall utilization high, so 
> the queue always needs to be full of jobs.
>
> If you can have 1000 nodes always idle and available, then your 1000 
> node jobs will usually take 10 seconds.  But your overall utilization 
> will be in the low single digit percent or worse.
>
> Regards,
>
> Alex
>
> On Thu, Jan 16, 2020 at 3:25 PM Lux, Jim (US 337K) via Beowulf 
> <beowulf at beowulf.org <mailto:beowulf at beowulf.org>> wrote:
>
>     Are there any references out there that discuss the tradeoffs
>     between interactive and batch scheduling (perhaps some from the
>     60s and 70s?) –
>
>     Most big HPC systems have a mix of giant jobs and smaller ones
>     managed by some process like PBS or SLURM, with queues of various
>     sized jobs.
>
>     What I’m interested in is the idea of jobs that, if spread across
>     many nodes (dozens) can complete in seconds (<1 minute) providing
>     essentially “interactive” access, in the context of large jobs
>     taking days to complete.   It’s not clear to me that the current
>     schedulers can actually do this – rather, they allocate M of N
>     nodes to a particular job pulled out of a series of queues, and
>     that job “owns” the nodes until it completes.  Smaller jobs get
>     run on (M-1) of the N nodes, and presumably complete faster, so it
>     works down through the queue quicker, but ultimately, if you have
>     a job that would take, say, 10 seconds on 1000 nodes, it’s going
>     to take 20 minutes on 10 nodes.
>
>     Jim
>
>     -- 
>
>     _______________________________________________
>     Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     To change your subscription (digest mode or unsubscribe) visit
>     https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200117/9267a698/attachment.html>