<div dir="ltr"><div><div><div><div><div>I agree with Doug. The way forward is a lightweight OS with containers for the applications.<br></div>I think we need to learn from the new kids on the block - the webscale generation.<br></div>They did not go out and look at how massive supercomputer clusters are put together.<br></div>No, they went out and build scale out applications built on public clouds.<br></div><div>We see 'applications designed to fail' and 'serverless'<br></div><div><br></div>Yes, I KNOW that scale out applications like these are Web type applications, and all application examples you <br></div>see are based on the load balancer/web server/database (or whatever style) paradigm<br><div><div><br></div><div>The art of this will be deploying the more tightly coupled applications with HPC has,<br></div><div>which depend upon MPI communications over a reliable fabric, which depend upon GPUs etc.<br><br></div><div>The other hat I will toss into the ring is separating parallel tasks which require computation on several<br></div><div>servers and MPI communication between them versus 'embarrassingly parallel' operations which may run on many, many cores<br></div><div>but do not particularly need communication between them.<br><br></div><div>The best successes I have seen on clusters is where the heavy parallel applications get exclusive compute nodes.<br></div><div>Cleaner, you get all the memory and storage bandwidth and easy to clean up. Hell, reboot the things after each job. You got an exclusive node.<br></div><div>I think many designs of HPC clusters still try to cater for all workloads - Oh Yes, we can run an MPI weather forecasting/ocean simulation<br></div><div>But at the same time we have this really fast IO system and we can run your Hadoop jobs. <br><br></div><div>I wonder if we are going to see a fork in HPC. With the massively parallel applications being deployed, as Doug says, on specialised <br></div><div>lightweight OSes which have dedicated high speed, reliable fabrics and with containers.<br></div><div>You won't really be able to manage those systems like individual Linux servers. Will you be able to ssh in for instance?<br></div><div>ssh assumes there is an ssh daemon running. Does a lightweight OS have ssh? Authentication Services? The kitchen sink?<br></div><div><br></div><div>The less parallel applications being run more and more on cloud type installations, either on-premise clouds or public clouds.<br></div><div>I confound myself here, as I cant say what the actual difference between those two types of machines is, as you always needs<br></div><div>an interconnect fabric and storage, so why not have the same for both types of tasks.<br></div><div>Maybe one further quip to stimulate some conversation. Silicon is cheap. No, really it is.<br></div><div>Your friendly Intel salesman may wince when you say that. After all those lovely Xeon CPUs cost north of 1000 dollars each.<br></div><div>But again I throw in some talking points:<br><br></div><div>power and cooling costs the same if not more than your purchase cost over several years<br><br></div><div>are we exploiting all the capabilities of those Xeon CPUs<br><br><br></div><div><br><br><br><br><br></div><div><br><br><br><br><br></div><div><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br></div><div><br><br><br></div><div><br><br><br><br><br></div><div><br><br><br><br><br><br><br><br><br></div><div><br><br><br><br><br><br></div><div><br><br><div><div><div><br><br><br><br><br><br><br><br><br><br><br><br></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 3 May 2018 at 15:04, Douglas Eadline <span dir="ltr"><<a href="mailto:deadline@eadline.org" target="_blank">deadline@eadline.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
Here is where I see it going<br>
<br>
1. Computer nodes with a base minimal generic Linux OS<br>
(with PR_SET_NO_NEW_PRIVS in kernel, added in 3.5)<br>
<br>
2. A Scheduler (that supports containers)<br>
<br>
3. Containers (Singularity mostly)<br>
<br>
All "provisioning" is moved to the container. There will be edge cases of<br>
course, but applications will be pulled down from<br>
a container repos and "just run"<br>
<br>
--<br>
Doug<br>
<br>
<br>
> I never used Bright. Touched it and talked to a salesperson at a<br>
<span class="">> conference but I wasn't impressed.<br>
><br>
> Unpopular opinion: I don't see a point in using "cluster managers"<br>
</span>> unless you have a very tiny cluster and zero Linux experience. These<br>
<span class="">> are just Linux boxes with a couple applications (e.g. Slurm) running on<br>
</span>> them. Nothing special. xcat/Warewulf/Scyld/Rocks just get in the way<br>
> more than they help IMO. They are mostly crappy wrappers around free<br>
> software (e.g. ISC's dhcpd) anyway. When they aren't it's proprietary<br>
<span class="">> trash.<br>
><br>
> I install CentOS nodes and use<br>
> Salt/Chef/Puppet/Ansible/<wbr>WhoCares/Whatever to plop down my configs and<br>
</span>> software. This also means I'm not suck with "node images" and can<br>
<span class="">> instead build everything as plain old text files (read: write SaltStack<br>
</span>> states), update them at will, and push changes any time. My "base<br>
<span class="">> image" is CentOS and I need no "baby's first cluster" HPC software to<br>
</span>> install/PXEboot it. YMMV<br>
<span class="">><br>
><br>
> Jeff White<br>
><br>
> On 05/01/2018 01:57 PM, Robert Taylor wrote:<br>
>> Hi Beowulfers.<br>
>> Does anyone have any experience with Bright Cluster Manager?<br>
>> My boss has been looking into it, so I wanted to tap into the<br>
>> collective HPC consciousness and see<br>
>> what people think about it.<br>
>> It appears to do node management, monitoring, and provisioning, so we<br>
>> would still need a job scheduler like lsf, slurm,etc, as well. Is that<br>
>> correct?<br>
>><br>
>> If you have experience with Bright, let me know. Feel free to contact<br>
>> me off list or on.<br>
>><br>
>><br>
>><br>
>> ______________________________<wbr>_________________<br>
>> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
>> To change your subscription (digest mode or unsubscribe) visit<br>
>> <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__www.beowulf.org_mailman_listinfo_beowulf&d=DwIGaQ&c=C3yme8gMkxg_ihJNXS06ZyWk4EJm8LdrrvxQb-Je7sw&r=DhM5WMgdrH-xWhI5BzkRTzoTvz8C-BRZ05t9kW9SXZk&m=2km_EqLvNf2v9rNf8LphAYkJ-Sc_azfEyHqyDIzpLOc&s=kq0wdhy80VqcBCwcQAAQa0RbsgWIekhd0qU0zC81g1Q&e=" rel="noreferrer" target="_blank">https://urldefense.proofpoint.<wbr>com/v2/url?u=http-3A__www.<wbr>beowulf.org_mailman_listinfo_<wbr>beowulf&d=DwIGaQ&c=<wbr>C3yme8gMkxg_<wbr>ihJNXS06ZyWk4EJm8LdrrvxQb-<wbr>Je7sw&r=DhM5WMgdrH-<wbr>xWhI5BzkRTzoTvz8C-<wbr>BRZ05t9kW9SXZk&m=2km_<wbr>EqLvNf2v9rNf8LphAYkJ-Sc_<wbr>azfEyHqyDIzpLOc&s=<wbr>kq0wdhy80VqcBCwcQAAQa0RbsgWIek<wbr>hd0qU0zC81g1Q&e=</a><br>
><br>
><br>
</span>> --<br>
> MailScanner: Clean<br>
<span class="">><br>
> ______________________________<wbr>_________________<br>
> Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
> To change your subscription (digest mode or unsubscribe) visit<br>
</span>> <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/<wbr>mailman/listinfo/beowulf</a><br>
><br>
<br>
<br>
-- <br>
Doug<br>
<br>
-- <br>
MailScanner: Clean<br>
<span class=""><br>
______________________________<wbr>_________________<br>
Beowulf mailing list, <a href="mailto:Beowulf@beowulf.org">Beowulf@beowulf.org</a> sponsored by Penguin Computing<br>
</span>To change your subscription (digest mode or unsubscribe) visit <a href="http://www.beowulf.org/mailman/listinfo/beowulf" rel="noreferrer" target="_blank">http://www.beowulf.org/<wbr>mailman/listinfo/beowulf</a><br>
</blockquote></div><br></div>