<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

p.msonormal0, li.msonormal0, div.msonormal0

        {mso-style-name:msonormal;

        mso-margin-top-alt:auto;

        margin-right:0in;

        mso-margin-bottom-alt:auto;

        margin-left:0in;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

span.EmailStyle18

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style>

</head>

<body lang="EN-US" link="blue" vlink="purple">

<div class="WordSection1">

<p class="MsoNormal">That’s an interesting point (cloud vs cluster) – if your jobs are sufficiently small that they will “fit” on a single node (so there’s no real need for inter node communications)  and it’s EP, then spinning up 1000 cloud instances might

 be a better approach.<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">

<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Beowulf <beowulf-bounces@beowulf.org> on behalf of Tim Cutts <tjrc@sanger.ac.uk><br>

<b>Date: </b>Friday, January 17, 2020 at 12:44 AM<br>

<b>To: </b>Alex Chekholko <alex@calicolabs.com><br>

<b>Cc: </b>"beowulf@beowulf.org" <beowulf@beowulf.org>, Jim Lux <james.p.lux@jpl.nasa.gov><br>

<b>Subject: </b>[EXTERNAL] Re: [Beowulf] Interactive vs batch, and schedulers [EXT]<o:p></o:p></span></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<p class="MsoNormal">Indeed, and you can quite easily get into a “boulders and sand” scheduling problem; if you allow the small interactive jobs (the sand) free access to everything, the scheduler tends to find them easy to schedule, partially fills nodes with

 them, and then finds it can’t find contiguous resources large enough for the big parallel jobs (the boulders), and you end up with the large batch jobs pending forever.

<o:p></o:p></p>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<div>

<p class="MsoNormal">I’ve tried various approaches to this in the past; for example pre-emption of large long running jobs, but that causes resource starvation (suspended jobs are still consuming virtual memory) and then all sorts of issues with timeouts on

 TCP connections and so on and so forth, these being genomics jobs with lots of not-normal-HPC activities like talking to relational databases etc.<o:p></o:p></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<div>

<p class="MsoNormal">I think you always end up having to ring-fence hardware for the large parallel batch jobs, and not allow the interactive stuff on it.<o:p></o:p></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<div>

<p class="MsoNormal">This of course is what leads some users to favour the cloud, because it appears to be infinite, and so the problem appears to go away.  But let's not get into that argument here.<o:p></o:p></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<div>

<p class="MsoNormal">Tim<o:p></o:p></p>

<div>

<p class="MsoNormal"><br>

<br>

<o:p></o:p></p>

<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">

<div>

<p class="MsoNormal">On 16 Jan 2020, at 23:50, Alex Chekholko via Beowulf <<a href="mailto:beowulf@beowulf.org">beowulf@beowulf.org</a>> wrote:<o:p></o:p></p>

</div>

<p class="MsoNormal"><o:p> </o:p></p>

<div>

<div>

<div>

<p class="MsoNormal">Hey Jim,<o:p></o:p></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<p class="MsoNormal">There is an inverse relationship between latency and throughput.  Most supercomputing centers aim to keep their overall utilization high, so the queue always needs to be full of jobs.

<o:p></o:p></p>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<div>

<p class="MsoNormal">If you can have 1000 nodes always idle and available, then your 1000 node jobs will usually take 10 seconds.  But your overall utilization will be in the low single digit percent or worse.<o:p></o:p></p>

</div>

<div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<div>

<p class="MsoNormal">Regards,<o:p></o:p></p>

</div>

<div>

<p class="MsoNormal">Alex<o:p></o:p></p>

</div>

</div>

<p class="MsoNormal"><o:p> </o:p></p>

<div>

<div>

<p class="MsoNormal">On Thu, Jan 16, 2020 at 3:25 PM Lux, Jim (US 337K) via Beowulf <<a href="mailto:beowulf@beowulf.org">beowulf@beowulf.org</a>> wrote:<o:p></o:p></p>

</div>

<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">

<div>

<div>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Are there any references out there that discuss the tradeoffs between interactive and batch scheduling (perhaps some from the 60s and 70s?) –

<o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Most big HPC systems have a mix of giant jobs and smaller ones managed by some process like PBS or SLURM, with queues of various sized jobs.

<o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">What I’m interested in is the idea of jobs that, if spread across many nodes (dozens) can complete in seconds (<1 minute) providing essentially “interactive” access, in the context

 of large jobs taking days to complete.   It’s not clear to me that the current schedulers can actually do this – rather, they allocate M of N nodes to a particular job pulled out of a series of queues, and that job “owns” the nodes until it completes.  Smaller

 jobs get run on (M-1) of the N nodes, and presumably complete faster, so it works down through the queue quicker, but ultimately, if you have a job that would take, say, 10 seconds on 1000 nodes, it’s going to take 20 minutes on 10 nodes.<o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Jim<o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>

<div>

<div>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-- <o:p></o:p></p>

</div>

</div>

<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>

</div>

</div>

<p class="MsoNormal">_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org" target="_blank">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwMFaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=xK7X4jUX3oG8IizF_lTh0GNrYM4sF9nUCxNKq6vi97c&s=rnNXVoLqTeEFVWB-0Jr0hJC0BgpH2_jm2s51IZb0H8o&e=" target="_blank">

https://beowulf.org/cgi-bin/mailman/listinfo/beowulf [beowulf.org]</a><o:p></o:p></p>

</blockquote>

</div>

<p class="MsoNormal">_______________________________________________<br>

Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>

To change your subscription (digest mode or unsubscribe) visit <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwIGaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=xK7X4jUX3oG8IizF_lTh0GNrYM4sF9nUCxNKq6vi97c&s=rnNXVoLqTeEFVWB-0Jr0hJC0BgpH2_jm2s51IZb0H8o&e=">

https://urldefense.proofpoint.com/v2/url?u=https-3A__beowulf.org_cgi-2Dbin_mailman_listinfo_beowulf&d=DwIGaQ&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=gSesY1AbeTURZwExR_OGFZlp9YUzrLWyYpGmwAw4Q50&m=xK7X4jUX3oG8IizF_lTh0GNrYM4sF9nUCxNKq6vi97c&s=rnNXVoLqTeEFVWB-0Jr0hJC0BgpH2_jm2s51IZb0H8o&e=</a>

<o:p></o:p></p>

</div>

</blockquote>

</div>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

<p class="MsoNormal">-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1

 2BE. <o:p></o:p></p>

</div>

</body>

</html>